Skip to main content

Chat Completions

Routor is a drop-in replacement for the OpenAI Chat Completions API. Any SDK or tool that works with OpenAI works with Routor. Just change the base URL and API key. An Anthropic-compatible endpoint is also available at POST /v1/messages for Claude Code and other Anthropic-format clients - it converts requests to the internal OpenAI shape, reuses the full routing/fallback/billing pipeline, and translates the response back to the Anthropic format. Endpoint:
POST https://api.routor.ai/v1/chat/completions

Request Format

Identical to the OpenAI Chat Completions API, with optional Routor-specific fields:
{
  "model": "auto",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Explain binary search trees." }
  ],
  "max_tokens": 1024,

  // ── Routor-specific (all optional) ──────────────
  "routor_profile":       "auto",
  "routor_tier":          "STANDARD",
  "routor_tier_floor":    "LIGHT",
  "routor_tier_ceiling":  "COMPLEX",
  "routor_max_cost":      0.05,
  "routor_bfcl_min":      0.8,
  "routor_code_quality":  2,
  "routor_chat_quality":  1
}

The model Field

ValueBehavior
"auto"Routor picks the model. full automatic routing
Any valid model IDRoutes directly to that model - only when paired with "routor_profile": "direct". Without it, the model field is ignored and routing runs as usual
Always use "auto" unless you have a specific reason to lock to a model. To force a specific model, set "routor_profile": "direct" and pass the model ID in model.

Routor-Specific Parameters

These fields are stripped before the request is forwarded to the provider. the provider never sees them.

routor_profile

"routor_profile": "auto" | "tier" | "direct"
  • "auto" (default). Routor classifies the prompt and picks the tier
  • "tier". Use with routor_tier to skip classification and force a specific tier
  • "direct". Bypass routing entirely and proxy straight to the model named in model. The model field must be a valid model ID from GET /v1/models

routor_tier

"routor_tier": "NANO" | "SIMPLE" | "LIGHT" | "STANDARD" | "COMPLEX"
Only used when routor_profile: "tier". Forces routing to this exact tier, bypassing classification. Example. always use STANDARD regardless of prompt:
{
  "model": "auto",
  "routor_profile": "tier",
  "routor_tier": "STANDARD",
  "messages": [...]
}

routor_tier_floor

"routor_tier_floor": "NANO" | "SIMPLE" | "LIGHT" | "STANDARD" | "COMPLEX"
Sets the minimum tier. Any request classified below this floor is bumped up to the floor. Example. never go below STANDARD:
{
  "model": "auto",
  "routor_tier_floor": "STANDARD",
  "messages": [...]
}

routor_tier_ceiling

"routor_tier_ceiling": "NANO" | "SIMPLE" | "LIGHT" | "STANDARD" | "COMPLEX"
Sets the maximum tier. Any request classified above this ceiling is capped at the ceiling. Example. cap at LIGHT to limit cost:
{
  "model": "auto",
  "routor_tier_ceiling": "LIGHT",
  "messages": [...]
}

routor_max_cost

"routor_max_cost": <number>
Sets a maximum estimated cost per request in USD. Models whose estimated cost exceeds this are filtered out of the candidate chain. Falls back to the full chain if no model qualifies.

routor_bfcl_min

"routor_bfcl_min": <number>
Minimum BFCL (function-calling) score a model must meet to stay in the candidate chain. Only meaningful when your request includes tools. Falls back to the full chain if no model qualifies.

routor_code_quality / routor_chat_quality

"routor_code_quality": 0 | 1 | 2
"routor_chat_quality": 0 | 1 | 2
Quality sliders that map to a tier floor. 0 = Fast (no floor), 1 = Balanced, 2 = Best.
  • routor_code_quality applies when the prompt looks like a coding task (floors: STANDARD for 1, COMPLEX for 2)
  • routor_chat_quality applies otherwise (floors: LIGHT for 1, STANDARD for 2)
The slider floor is merged with any explicit routor_tier_floor - the higher of the two wins.

Response Format

Identical to the OpenAI Chat Completions response, plus a routor object with routing metadata:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1748700000,
  "model": "moonshot/kimi-k2.6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A binary search tree is a data structure where..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 142,
    "total_tokens": 170
  },
  "routor": {
    "model":       "moonshot/kimi-k2.6",
    "tier":        "LIGHT",
    "profile":     "auto",
    "confidence":  0.82,
    "savingsPct":  87.4,
    "method":      "rules"
  }
}
Note: the model field in the response shows the actual model used, not "auto". The routor object’s model may differ from the top-level model when a fallback was used.

Response Headers

Every response includes X-Request-Id. When the server has DEBUG_ROUTING=1 set, routing metadata is also exposed in headers:
X-Request-Id:           req_01abc123
X-Routor-Model:         moonshot/kimi-k2.6
X-Routor-Tier:          LIGHT
X-Routor-Confidence:    0.82
X-Routor-Profile:       auto
X-Routor-Savings:       87.4%
X-Routor-Fallback:      false
Without that flag, use the routor object in the response body (shown above) instead. See Response Metadata for the full reference.

Streaming

Streaming works exactly like OpenAI streaming:
const stream = await client.chat.completions.create({
  model:    "auto",
  messages: [{ role: "user", content: "Write a short story about a robot." }],
  stream:   true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Error Handling

Routor returns standard HTTP errors:
StatusMeaning
400Bad request. missing or invalid fields
401Invalid or missing API key
402Insufficient credits
429Rate limit exceeded
503All providers in the fallback chain failed
{
  "error": {
    "message": "Insufficient credits. Please top up your account.",
    "type":    "billing_error"
  }
}