LLM API Integration Reference 2026

/resources · Reference

Written by Josh Ausmus · Updated April 2026

Live reference · updated continuously

Provider Comparison

Use ANTHROPIC_API_KEY, OPENAI_API_KEY, or XAI_API_KEY.

Provider	Auth Method	Base URL	Flagship Models	Rate Limits (approx)	Pricing per 1M tokens (input/output)
Anthropic	x-api-key header	https://api.anthropic.com	claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5	~50 RPM, varies by tier & model (ITPM 30k-50k)	Haiku 4.5: $1/$5, Sonnet 4.6: $3/$15, Opus 4.6: $5/$25 [1][2]
OpenAI	Bearer token	https://api.openai.com	gpt-5.4, gpt-5.4-mini	Tiered (500-10k+ RPM depending on spend)	gpt-5.4: ~$2.50/$15, mini variants cheaper [3][4]
xAI/Grok	Bearer token (OpenAI compat)	https://api.x.ai	grok-4.20, grok-4.1-fast	Tiered by spend ($0-$5k+ tiers)	grok-4.1-fast: $0.20/$0.50, others ~$2-3/$6-15 [5][6]

Basic Completion

Python

from anthropic import Anthropic
client = Anthropic()
resp = client.messages.create(
 model="claude-sonnet-4-6",
 max_tokens=1024,
 messages=[{"role": "user", "content": "Explain PID briefly."}]
)
print(resp.content[0].text)

TypeScript

import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic();
const resp = await anthropic.messages.create({
 model: "claude-sonnet-4-6",
 max_tokens: 1024,
 messages: [{role: "user", content: "Explain PID briefly."}]
});
console.log(resp.content[0].text);

For OpenAI and xAI, swap to openai.chat.completions.create with matching model and messages. xAI uses the OpenAI SDK with base_url="https://api.x.ai/v1".

Streaming

Python (OpenAI style, works for all three with SDK adjustments)

from openai import OpenAI
client = OpenAI(base_url="https://api.x.ai/v1", api_key="xai-...")
stream = client.chat.completions.create(
 model="grok-4.1-fast",
 messages=[{"role": "user", "content": "Hello"}],
 stream=True
)
for chunk in stream:
 if chunk.choices[0].delta.content:
 print(chunk.choices[0].delta.content, end="")

Anthropic uses stream=True on messages.create and iterates over Stream[RawMessageStreamEvent].

Tool Calling

Define tools once. OpenAI and xAI use tools list with type: "function". Anthropic uses tools with input_schema.

Structured Output (JSON mode)

Python (OpenAI with Pydantic)

from pydantic import BaseModel
from openai import OpenAI

class PIDParams(BaseModel):
 kp: float
 ki: float
 kd: float

client = OpenAI()
completion = client.chat.completions.create(
 model="gpt-5.4",
 messages=[{"role": "user", "content": "Suggest PID for a thermostat."}],
 response_format={"type": "json_object"}
)
print(completion.choices[0].message.content)

Anthropic enforces structure via tool definitions with JSON schema. Use tools + handle tool_use responses.

Error Handling: Retry with Backoff

import time
import random
from openai import RateLimitError, APIError, APITimeoutError

def call_with_retry(client, **kwargs, max_retries=5):
 for attempt in range(max_retries):
 try:
 return client.chat.completions.create(**kwargs)
 except RateLimitError:
 sleep = (2 ** attempt) + random.random()
 time.sleep(sleep)
 except (APIError, APITimeoutError):
 if attempt == max_retries - 1:
 raise
 time.sleep(2 ** attempt)
 raise Exception("Max retries exceeded")

Detect Anthropic rate limits via status 429 and anthropic-ratelimit- headers.

Cost Tracking Middleware Pattern

Wrap the client. Track input/output tokens from usage dict. Log or send to Prometheus, and Simple version:

class CostTracker:
 def __init__(self):
 self.total_cost = 0.0

 def track(self, usage, model):
 # model-specific $/M rates here
 cost = (usage.input_tokens * 0.003 + usage.output_tokens * 0.015) / 1_000_000
 self.total_cost += cost
 return cost

Attach as a decorator or middleware. Recalculate on every response.

Setup Checklist

Anthropic

pip install anthropic==0.88.* (or npm equivalent)
export ANTHROPIC_API_KEY=sk-ant-...
client = Anthropic() (pulls from env)
First call: use claude-sonnet-4-6, max_tokens=1024

OpenAI

pip install openai
export OPENAI_API_KEY=sk-...
client = OpenAI()
First call: gpt-5.4 or gpt-5.4-mini

xAI

Use openai SDK
export XAI_API_KEY=xai-...
client = OpenAI(base_url="https://api.x.ai/v1", api_key=os.getenv("XAI_API_KEY"))
Test with grok-4.1-fast

Common Gotchas

Token counting differs. Anthropic includes cache tokens in billing.
Structured outputs fail silently on weak models. Test with Sonnet or gpt-5.4 first.
Rate limits are per-tier and per-model. High-volume needs spend-based tiers.
Long context (>200k on some models) doubles Anthropic pricing.
Tool calls return control to you. Always loop until no more tool_use or assistant message.
xAI OpenAI-compat is close but check tool schema support. Some edge cases differ.
Never put secrets in messages. Use system prompts or separate context.

Production code retries on 429 and 5xx. Track every token. Pick the cheapest model that meets quality. Test structured output with real payloads. The spec sheet rarely matches real failure modes. If your loop exceeds three tool rounds, redesign the prompt.