Provider Comparison
| Provider | Auth Method | Base URL | Flagship Models | Rate Limits (typical Tier 2) | Pricing per 1M tokens (input/output) |
|---|---|---|---|---|---|
| Anthropic | x-api-key | https://api.anthropic.com | claude-opus-4.6, claude-sonnet-4.6 | ~1k RPM, tiered TPM | $5/$25 (Opus), $3/$15 (Sonnet) |
| OpenAI | Bearer token | https://api.openai.com/v1 | gpt-5.4, gpt-5.4-mini | Tier-based, spend scaled | $2.50/$15 (gpt-5.4) |
| xAI/Grok | Bearer token | https://api.x.ai/v1 | grok-4.20, grok-4.1-fast | Tier by cumulative spend | $2-3/$6-15 (flagship), $0.20/$0.50 (fast) |
Basic Completion
# Python - OpenAI compatible (works for OpenAI and xAI)
from openai import OpenAI
client = OpenAI. Api_key="sk-...", base_url="https://api.x.ai/v1" # change base for xAI
resp = client.chat.completions.create. Model="grok-4.20",
messages=[{"role": "user", "content": "learn more PID loops."}],
temperature=0.7,
max_tokens=512
print. Resp.choices[0].message.content
// TypeScript - OpenAI compatible
import OpenAI from 'openai';
const client = new OpenAI. {
apiKey: process.env.XAI_API_KEY,
baseURL: 'https://api.x.ai/v1'
};
const resp = await client.chat.completions.create. {
model: 'grok-4.20',
messages: [{ role: 'user', content: 'Explain PID loops.' }],
temperature: 0.7,
max_tokens: 512
};
console.log. Resp.choices[0].message.content;
Anthropic Messages (Python + TS)
# Python Anthropic SDK 0.88.x
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
resp = client.messages.create. Model="claude-sonnet-4.6",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain PID loops."}]
print(resp.content[0].text)
// TypeScript Anthropic
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic. { apiKey: process.env.ANTHROPIC_API_KEY };
const resp = await client.messages.create. {
model: "claude-sonnet-4.6",
max_tokens: 1024,
messages: [{ role: "user", content: "Explain PID loops." }]
};
console.log(resp.content[0].text);
Streaming
Use stream=True for OpenAI-compatible. For Anthropic use .stream() or stream=True.
Tool Calling
OpenAI/xAI style (function tools in the request). Anthropic uses tools array with input_schema. The pattern is similar but schema format differs slightly. Test both in the same codebase with a thin wrapper.
Structured Output (JSON mode)
# OpenAI / xAI
resp = client.chat.completions.create. Model="gpt-5.4",
messages=[...],
response_format={"type": "json_object"},
tools=[{"type": "function", "function": {...}}] # or json_schema
Anthropic supports tool_choice with JSON schema tools for guaranteed structure.
Error Handling: Retry with Exponential Backoff
import time
from openai import RateLimitError, APIConnectionError, APITimeoutError
def call_with_retry. Client, **kwargs, max_retries=5:
for attempt in range(max_retries):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError:
wait = (2 ** attempt) * 0.5 + 0.1 # jitter
time.sleep(wait)
except . APIConnectionError, APITimeoutError:
time.sleep(2 ** attempt)
except Exception as e:
if "timeout" in str(e).lower():
time.sleep(1)
else:
raise
raise Exception("Max retries exceeded")
Cost Tracking Middleware Pattern
Wrap the client call. Track input/output tokens from response.usage. Log or send to your observability layer (Prometheus, custom DB). Add a context manager that accumulates per-request and per-session totals. For Anthropic read from usage dict on the response.
Setup Checklist
Anthropic
pip install anthropic==0.88.*or latest 0.88.x- Set
ANTHROPIC_API_KEYenv var - First call: use Messages API with max_tokens required
- Enable prompt caching on repeated system prompts for 90%+ savings
OpenAI
pip install openai- Set
OPENAI_API_KEY - Use
openai>=1.0 - First call with chat.completions.create
xAI/Grok
- Use OpenAI SDK
- Set base_url to
https://api.x.ai/v1 - Use
XAI_API_KEYor same key name - Start with grok-4.1-fast for cost-sensitive work before moving to flagship
Common Integration Gotchas
- Rate limits are per-tier and spend-based on xAI and OpenAI. Monitor headers or console.
- Anthropic requires max_tokens on every call. No default.
- Token counts differ slightly across providers. Never assume 1:1 cost transfer.
- Streaming responses consume rate limit the same as non-streaming. Plan accordingly.
- Tool calling schemas must match provider expectations exactly or the call fails silently on structure.
- Long context costs more on some models. Stay under the pricing threshold when possible (check current docs).
- Always implement token usage logging. Costs add up faster than you expect on production traffic.
Use the cheapest model that clears your quality bar. Track real costs per feature. Retry logic belongs in one shared client wrapper. Test structured output with real schemas before shipping. The signal chain here's prompt to tokens to dollars. Get the middle part right and the rest gets easier.