Jump to content
Jump to content
✓ Done
/resources · Reference

LLM API Integration Reference 2026

Reviewed by Josh Ausmus · Updated April 2026

Live reference · updated continuously

Provider Comparison

Use ANTHROPIC_API_KEY, OPENAI_API_KEY, or XAI_API_KEY.

Provider Auth Method Base URL Flagship Models Rate Limits (approx) Pricing per 1M tokens (input/output)
Anthropic x-api-key header https://api.anthropic.com claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 ~50 RPM, varies by tier & model (ITPM 30k-50k) Haiku 4.5: $1/$5, Sonnet 4.6: $3/$15, Opus 4.6: $5/$25 [1][2]
OpenAI Bearer token https://api.openai.com gpt-5.4, gpt-5.4-mini Tiered (500-10k+ RPM depending on spend) gpt-5.4: ~$2.50/$15, mini variants cheaper [3][4]
xAI/Grok Bearer token (OpenAI compat) https://api.x.ai grok-4.20, grok-4.1-fast Tiered by spend ($0-$5k+ tiers) grok-4.1-fast: $0.20/$0.50, others ~$2-3/$6-15 [5][6]

Basic Completion

Python

from anthropic import Anthropic
client = Anthropic()
resp = client.messages.create(
 model="claude-sonnet-4-6",
 max_tokens=1024,
 messages=[{"role": "user", "content": "learn more PID briefly."}]
)
print(resp.content[0].text)

TypeScript

import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
const resp = await anthropic.messages.create({
 model: "claude-sonnet-4-6",
 max_tokens: 1024,
 messages: [{role: "user", content: "Explain PID briefly."}]
});
console.log(resp.content[0].text);

For OpenAI and xAI, swap to openai.chat.completions.create with matching model and messages. xAI uses the OpenAI SDK with base_url="https://api.x.ai/v1".

Streaming

Python (OpenAI style, works for all three with SDK adjustments)

from openai import OpenAI
client = OpenAI(base_url="https://api.x.ai/v1", api_key="xai-...")
stream = client.chat.completions.create(
 model="grok-4.1-fast",
 messages=[{"role": "user", "content": "Hello"}],
 stream=True
)
for chunk in stream:
 if chunk.choices[0].delta.content:
 print(chunk.choices[0].delta.content, end="")

Anthropic uses stream=True on messages.create and iterates over Stream[RawMessageStreamEvent].

Tool Calling

Define tools once. OpenAI and xAI use tools list with type: "function". Anthropic uses tools with input_schema.

Structured Output (JSON mode)

Python (OpenAI with Pydantic)

from pydantic import BaseModel
from openai import OpenAI

class PIDParams(BaseModel):
 kp: float
 ki: float
 kd: float

client = OpenAI()
completion = client.chat.completions.create(
 model="gpt-5.4",
 messages=[{"role": "user", "content": "Suggest PID for a thermostat."}],
 response_format={"type": "json_object"}
)
print(completion.choices[0].message.content)

Anthropic enforces structure via tool definitions with JSON schema. Use tools + handle tool_use responses.

Error Handling: Retry with Backoff

import time
import random
from openai import RateLimitError, APIError, APITimeoutError

def call_with_retry(client, **kwargs, max_retries=5):
 for attempt in range(max_retries):
 try:
 return client.chat.completions.create(**kwargs)
 except RateLimitError:
 sleep = (2 ** attempt) + random.random()
 time.sleep(sleep)
 except (APIError, APITimeoutError):
 if attempt == max_retries - 1:
 raise
 time.sleep(2 ** attempt)
 raise Exception("Max retries exceeded")

Detect Anthropic rate limits via status 429 and anthropic-ratelimit- headers.

Cost Tracking Middleware Pattern

Wrap the client. Track input/output tokens from usage dict. Log or send to Prometheus, and Simple version:

class CostTracker:
 def __init__(self):
 self.total_cost = 0.0

 def track(self, usage, model):
 # model-specific $/M rates here
 cost = (usage.input_tokens * 0.003 + usage.output_tokens * 0.015) / 1_000_000
 self.total_cost += cost
 return cost

Attach as a decorator or middleware. Recalculate on every response.

Setup Checklist

Anthropic

  • pip install anthropic==0.88.* (or npm equivalent)
  • export ANTHROPIC_API_KEY=sk-ant-...
  • client = Anthropic() (pulls from env)
  • First call: use claude-sonnet-4-6, max_tokens=1024

OpenAI

  • pip install openai
  • export OPENAI_API_KEY=sk-...
  • client = OpenAI()
  • First call: gpt-5.4 or gpt-5.4-mini

xAI

  • Use openai SDK
  • export XAI_API_KEY=xai-...
  • client = OpenAI(base_url="https://api.x.ai/v1", api_key=os.getenv("XAI_API_KEY"))
  • Test with grok-4.1-fast

Common Gotchas

  • Token counting differs. Anthropic includes cache tokens in billing.
  • Structured outputs fail silently on weak models. Test with Sonnet or gpt-5.4 first.
  • Rate limits are per-tier and per-model. High-volume needs spend-based tiers.
  • Long context (>200k on some models) doubles Anthropic pricing.
  • Tool calls return control to you. Always loop until no more tool_use or assistant message.
  • xAI OpenAI-compat is close but check tool schema support. Some edge cases differ.
  • Never put secrets in messages. Use system prompts or separate context.

Production code retries on 429 and 5xx. Track every token. Pick the cheapest model that meets quality. Test structured output with real payloads. The spec sheet rarely matches real failure modes. If your loop exceeds three tool rounds, redesign the prompt.

How To Reduce Ai Api Costs: Save 60-80% On Llm Spend
how to reduce ai api costs by tracking layered expenses in production AI agents. Non-LLM costs can account for 27-50% of total spend in 2026.
Ai Model Cost Per Token 2026: 12 Hidden Cost Layers
ai model cost per token 2026 reveals the 12 hidden cost layers in LLM APIs beyond simple rates. This guide explains the real costs for AI teams in 2026.
Ai Agent Development Cost Breakdown In 2026
ai agent development cost breakdown: costs $20k-$200k+ in 2026. Initial build 25-35% of 3yr total spend; 65-75% in tokens, monitoring, maintenance & governance.