Provider Comparison
Use ANTHROPIC_API_KEY, OPENAI_API_KEY, or XAI_API_KEY.
| Provider | Auth Method | Base URL | Flagship Models | Rate Limits (approx) | Pricing per 1M tokens (input/output) |
|---|---|---|---|---|---|
| Anthropic | x-api-key header | https://api.anthropic.com | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 | ~50 RPM, varies by tier & model (ITPM 30k-50k) | Haiku 4.5: $1/$5, Sonnet 4.6: $3/$15, Opus 4.6: $5/$25 [1][2] |
| OpenAI | Bearer token | https://api.openai.com | gpt-5.4, gpt-5.4-mini | Tiered (500-10k+ RPM depending on spend) | gpt-5.4: ~$2.50/$15, mini variants cheaper [3][4] |
| xAI/Grok | Bearer token (OpenAI compat) | https://api.x.ai | grok-4.20, grok-4.1-fast | Tiered by spend ($0-$5k+ tiers) | grok-4.1-fast: $0.20/$0.50, others ~$2-3/$6-15 [5][6] |
Basic Completion
Python
from anthropic import Anthropic
client = Anthropic()
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "learn more PID briefly."}]
)
print(resp.content[0].text)
TypeScript
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
const resp = await anthropic.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{role: "user", content: "Explain PID briefly."}]
});
console.log(resp.content[0].text);
For OpenAI and xAI, swap to openai.chat.completions.create with matching model and messages. xAI uses the OpenAI SDK with base_url="https://api.x.ai/v1".
Streaming
Python (OpenAI style, works for all three with SDK adjustments)
from openai import OpenAI
client = OpenAI(base_url="https://api.x.ai/v1", api_key="xai-...")
stream = client.chat.completions.create(
model="grok-4.1-fast",
messages=[{"role": "user", "content": "Hello"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Anthropic uses stream=True on messages.create and iterates over Stream[RawMessageStreamEvent].
Tool Calling
Define tools once. OpenAI and xAI use tools list with type: "function". Anthropic uses tools with input_schema.
Structured Output (JSON mode)
Python (OpenAI with Pydantic)
from pydantic import BaseModel
from openai import OpenAI
class PIDParams(BaseModel):
kp: float
ki: float
kd: float
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Suggest PID for a thermostat."}],
response_format={"type": "json_object"}
)
print(completion.choices[0].message.content)
Anthropic enforces structure via tool definitions with JSON schema. Use tools + handle tool_use responses.
Error Handling: Retry with Backoff
import time
import random
from openai import RateLimitError, APIError, APITimeoutError
def call_with_retry(client, **kwargs, max_retries=5):
for attempt in range(max_retries):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError:
sleep = (2 ** attempt) + random.random()
time.sleep(sleep)
except (APIError, APITimeoutError):
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
raise Exception("Max retries exceeded")
Detect Anthropic rate limits via status 429 and anthropic-ratelimit- headers.
Cost Tracking Middleware Pattern
Wrap the client. Track input/output tokens from usage dict. Log or send to Prometheus, and Simple version:
class CostTracker:
def __init__(self):
self.total_cost = 0.0
def track(self, usage, model):
# model-specific $/M rates here
cost = (usage.input_tokens * 0.003 + usage.output_tokens * 0.015) / 1_000_000
self.total_cost += cost
return cost
Attach as a decorator or middleware. Recalculate on every response.
Setup Checklist
Anthropic
pip install anthropic==0.88.*(or npm equivalent)export ANTHROPIC_API_KEY=sk-ant-...client = Anthropic()(pulls from env)- First call: use
claude-sonnet-4-6,max_tokens=1024
OpenAI
pip install openaiexport OPENAI_API_KEY=sk-...client = OpenAI()- First call:
gpt-5.4orgpt-5.4-mini
xAI
- Use openai SDK
export XAI_API_KEY=xai-...client = OpenAI(base_url="https://api.x.ai/v1", api_key=os.getenv("XAI_API_KEY"))- Test with
grok-4.1-fast
Common Gotchas
- Token counting differs. Anthropic includes cache tokens in billing.
- Structured outputs fail silently on weak models. Test with Sonnet or gpt-5.4 first.
- Rate limits are per-tier and per-model. High-volume needs spend-based tiers.
- Long context (>200k on some models) doubles Anthropic pricing.
- Tool calls return control to you. Always loop until no more tool_use or assistant message.
- xAI OpenAI-compat is close but check tool schema support. Some edge cases differ.
- Never put secrets in messages. Use system prompts or separate context.
Production code retries on 429 and 5xx. Track every token. Pick the cheapest model that meets quality. Test structured output with real payloads. The spec sheet rarely matches real failure modes. If your loop exceeds three tool rounds, redesign the prompt.