What OpenRouter is
- Unified OpenAI-compatible API that routes your requests to 300+ models from 60+ providers.
- Handles provider keys, fallbacks, uptime optimization and billing in one place.
- Free tier gives 25+ models and 50 requests per day. Paid adds everything else with a 5.5% platform fee on credit purchases.
Quick start
Get your API key from openrouter.ai/keys. Add these headers for attribution (helps their rankings):
HTTP-Referer: your site URL
X-OpenRouter-Title: your app name
Python basic completion
import requests
import json
import os
API_KEY = os.getenv("OPENROUTER_API_KEY")
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY}",
"HTTP-Referer": "https://yourapp.com",
"X-OpenRouter-Title": "MyApp",
}
payload = {
"model": "openai/gpt-5.2",
"messages": [{"role": "user", "content": "Explain PID control loops in 50 words."}]
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()["choices"][0]["message"]["content"])
Python streaming
Add "stream": True and handle the SSE stream yourself or use openai library with base_url="https://openrouter.ai/api/v1".
TypeScript with OpenAI SDK (recommended)
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
'HTTP-Referer': 'https://yourapp.com',
'X-OpenRouter-Title': 'MyApp',
},
});
const completion = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.6',
messages: [{ role: 'user', content: 'Your prompt here' }],
stream: true, // set false for non-streaming
});
for await (const chunk of completion) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Tool calling example (add to payload)
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"parameters": { "type": "object", "properties": { "location": { "type": "string" } } }
}
}]
Web search or plugins work through provider-specific models that support them (e.g. certain Gemini or Claude variants). Specify the model that has the feature.
Best models on OpenRouter (April 2026)
| Model | Approx Input/Output ($/M tokens) | Context | Best use case |
|---|---|---|---|
| MiMo-V2-Flash | 0.09 / ~0.3 | 128k+ | Fast cheap general tasks |
| Step 3.5 Flash (free variant) | 0 / 0 on free | 256k | Experimentation, prototyping |
| DeepSeek V3.2 / R1 | 0.25 / 0.38 - 2.50 | 128k-200k | Strong coding and reasoning |
| Grok 4.1 Fast | ~0.20 / ~0.8 | 2M | Long context extraction |
| Qwen3.6 Plus / variants | 0.45-0.60 / ~1-2 | 1M | General knowledge, math |
| Gemini 3 Flash / 3.1 Pro | 0.50 / 1.5 - 2.0 | 1M+ | Multimodal (vision, audio) |
| Claude Sonnet 4.6 | ~3.0 / ~15 | 1M | Reliable coding, agent work |
| GPT-5.4 / GPT-5.2 | 2.50 / 15+ | 200k-1M | Top-tier reasoning, complex tasks |
| Claude Opus 4.6 | ~5.0 / 25+ | 1M | Hardest software engineering |
| Llama 3.3 70B (free) | 0 / 0 on free | 128k | Local-like open model testing |
Prices reflect weighted provider averages. Check the model page for exact routing options. Free variants have :free in the ID and stricter rate limits.
Honest review
Strengths
- Single key and one SDK gets you real choice between providers without managing 60 billing accounts.
- Routing and fallbacks actually improve uptime compared to raw provider APIs.
- Free tier works for testing. No credit card needed for the basic models.
- Model pages show real benchmark tags and per-provider pricing transparency.
Limitations and gotchas
- 5.5% fee adds up on heavy usage. Minimum top-up charge exists.
- Extra network hop adds 50-200ms latency versus direct provider calls.
- Some models are quantized or routed to cheaper providers with slightly lower quality.
- Free tier is 50 requests per day total. Heavy testing burns through it fast.
- You still need to watch provider-specific quirks. The router doesn't fix broken system prompts.
When to use OpenRouter vs direct API access
Use OpenRouter when
- You want to A/B test multiple models without code changes.
- Your app needs automatic fallbacks or uptime routing.
- You run at low-to-medium volume and value developer time over 5% savings.
- You need one billing dashboard for everything.
Use direct provider APIs when
- You have high volume and the 5.5% matters.
- Latency is critical (real-time apps).
- You already have volume discounts or credits with one provider.
- You need provider-specific features that the router doesn't expose cleanly.
The router shines for prototypes and mid-size products. At large scale the fee and extra hop usually push teams to direct connections or self-hosted gateways.
If your workload fits in the free tier or you swap models often, it saves setup time. Otherwise the convenience tax is real.