Skip to main content

How Routor Works

Every request you send with model: "auto" is handled automatically. Routor picks the right model, routes the request, and falls back silently if anything goes wrong. The whole process adds a few milliseconds before the model even sees your prompt. Routing pipeline - your request, the right model, in milliseconds

The Four Steps


Step 1 - Classify

Routor reads your prompt and makes two independent decisions:
  1. Task category - what kind of work this is. Routor detects 17 categories: coding, debugging, math, planning, research, science, reasoning, writing, vision, multilingual, tool use, instruction following, and more. The category determines which model pool to draw from.
  2. Difficulty tier - how hard the task is, scored across 15 weighted dimensions (code presence, reasoning markers, technical terms, agentic tasks, constraints, etc.). The score maps to one of 5 tiers. See The 5 Tiers.
No extra API call is made. No credits are consumed for this step. The classifier runs in under 1ms.

Step 2 - Filter by Capability

If your request includes images, files, audio, video, or tool calls, Routor narrows the model pool to only models that can handle it.
  • Image attached? Only vision-capable models
  • File attached? Only file-capable models
  • Tools defined? Only tool-calling models, ranked by function-calling accuracy (BFCL scores)
  • Cost cap or quality floor set? Models that don’t meet it are filtered out
Models whose context window is too small for your request, or whose provider failed startup validation, are also removed.

Step 3 - Rank by Value

From the filtered pool, Routor ranks models by value - a weighted blend of quality (the model’s benchmark score on this category) and saving (how much cheaper it is than the tier’s reference model). The default balance is 50/50. The tier’s premium reference model is kept last as a safety net. So the chain runs from best-value first to premium last: A LIGHT tier request might produce a chain like:
zai/glm-5.2  →  moonshot/kimi-k2.6  →  google/gemini-3.5-flash
The first model is tried first. If it fails, the next one is used automatically - escalating to the premium reference if everything else fails.

Step 4 - Route and Return

The request goes to the selected provider. The response comes back in the same OpenAI-compatible format, plus a routor object in the response body with routing metadata (model, tier, profile, confidence, savings). X-Request-Id is always sent as a header; the full set of X-Routor-* headers is included when your deployment has DEBUG_ROUTING=1 set. See Response Headers.

What “auto” Means

When you set model: "auto", Routor owns the model decision completely. You can constrain it if needed:
{
  "model": "auto",
  "routor_tier_floor": "STANDARD",
  "routor_tier_ceiling": "COMPLEX"
}
Or skip classification entirely and force a specific tier:
{
  "model": "auto",
  "routor_profile": "tier",
  "routor_tier": "LIGHT"
}
To bypass routing and proxy straight to a named model, use "routor_profile": "direct" with a model ID. See Chat Completions for the full parameter list.

Confidence Score

Every routing decision includes a confidence score between 0 and 1. It reflects how far the prompt’s score sat from the nearest tier boundary - a clear, decisive prompt scores high; a borderline one scores closer to 0.5. Confidence is informational. The tier is always assigned from the score’s position relative to the configured boundaries, so even ambiguous prompts get the tier their score maps to (which, for neutral prompts, is typically LIGHT). You can see this in the response header, if your deployment has DEBUG_ROUTING=1 set:
X-Routor-Confidence: 0.87
Otherwise, call the Debug Endpoint with the same prompt to get the confidence score on demand. High confidence means a clear classification. Low confidence means the prompt sat near a boundary.

Why Routor Does Not Use an AI Model to Route

The obvious question: why not just ask a fast model to classify each request? Some routers do this. It is the wrong tradeoff. It adds 500 to 2000ms of latency. Calling an LLM to classify a prompt before calling the actual model doubles your round-trip time. Users feel that. A routing step that takes longer than the actual answer defeats the point. It costs money on every single request. Every classification is a billable API call. For a product doing 100,000 requests a day, the routing cost alone becomes a meaningful line item - before you have even paid for the actual answer. It creates a loop problem. If the classifier itself is a model call, what decides which model does the classifying? And what happens when the classifier model goes down? You do not need AI to decide complexity. Whether a prompt needs a reasoning specialist or a cheap fast model is a deterministic signal. It does not require intelligence. It requires speed. Routor’s classifier runs in under 1ms, consumes zero tokens, makes zero API calls, and costs nothing per request. The overhead of routing is invisible. The result is that Routor’s routing overhead is effectively zero. Your users never wait for a routing decision. Your bill never includes a routing cost.