Jump to content
Jump to content
✓ Done
/resources · Reference

LLM API Integration Reference 2026

Reviewed by Josh Ausmus · Updated April 2026

Download PDF ↓

Provider Comparison

Provider Auth Method Base URL Flagship Models Rate Limits (typical Tier 2) Pricing per 1M tokens (input/output)
Anthropic x-api-key https://api.anthropic.com claude-opus-4.6, claude-sonnet-4.6 ~1k RPM, tiered TPM $5/$25 (Opus), $3/$15 (Sonnet)
OpenAI Bearer token https://api.openai.com/v1 gpt-5.4, gpt-5.4-mini Tier-based, spend scaled $2.50/$15 (gpt-5.4)
xAI/Grok Bearer token https://api.x.ai/v1 grok-4.20, grok-4.1-fast Tier by cumulative spend $2-3/$6-15 (flagship), $0.20/$0.50 (fast)

Basic Completion

# Python - OpenAI compatible (works for OpenAI and xAI)
from openai import OpenAI
client = OpenAI. Api_key="sk-...", base_url="https://api.x.ai/v1" # change base for xAI
resp = client.chat.completions.create. Model="grok-4.20",
 messages=[{"role": "user", "content": "learn more PID loops."}],
 temperature=0.7,
 max_tokens=512
print. Resp.choices[0].message.content
// TypeScript - OpenAI compatible
import OpenAI from 'openai';
const client = new OpenAI. {
 apiKey: process.env.XAI_API_KEY,
 baseURL: 'https://api.x.ai/v1'
};
const resp = await client.chat.completions.create. {
 model: 'grok-4.20',
 messages: [{ role: 'user', content: 'Explain PID loops.' }],
 temperature: 0.7,
 max_tokens: 512
};
console.log. Resp.choices[0].message.content;

Anthropic Messages (Python + TS)

# Python Anthropic SDK 0.88.x
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
resp = client.messages.create. Model="claude-sonnet-4.6",
 max_tokens=1024,
 messages=[{"role": "user", "content": "Explain PID loops."}]
print(resp.content[0].text)
// TypeScript Anthropic
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic. { apiKey: process.env.ANTHROPIC_API_KEY };
const resp = await client.messages.create. {
 model: "claude-sonnet-4.6",
 max_tokens: 1024,
 messages: [{ role: "user", content: "Explain PID loops." }]
};
console.log(resp.content[0].text);

Streaming

Use stream=True for OpenAI-compatible. For Anthropic use .stream() or stream=True.

Tool Calling

OpenAI/xAI style (function tools in the request). Anthropic uses tools array with input_schema. The pattern is similar but schema format differs slightly. Test both in the same codebase with a thin wrapper.

Structured Output (JSON mode)

# OpenAI / xAI
resp = client.chat.completions.create. Model="gpt-5.4",
 messages=[...],
 response_format={"type": "json_object"},
 tools=[{"type": "function", "function": {...}}] # or json_schema

Anthropic supports tool_choice with JSON schema tools for guaranteed structure.

Error Handling: Retry with Exponential Backoff

import time
from openai import RateLimitError, APIConnectionError, APITimeoutError

def call_with_retry. Client, **kwargs, max_retries=5:
 for attempt in range(max_retries):
 try:
 return client.chat.completions.create(**kwargs)
 except RateLimitError:
 wait = (2 ** attempt) * 0.5 + 0.1 # jitter
 time.sleep(wait)
 except . APIConnectionError, APITimeoutError:
 time.sleep(2 ** attempt)
 except Exception as e:
 if "timeout" in str(e).lower():
 time.sleep(1)
 else:
 raise
 raise Exception("Max retries exceeded")

Cost Tracking Middleware Pattern

Wrap the client call. Track input/output tokens from response.usage. Log or send to your observability layer (Prometheus, custom DB). Add a context manager that accumulates per-request and per-session totals. For Anthropic read from usage dict on the response.

Setup Checklist

Anthropic

  • pip install anthropic==0.88.* or latest 0.88.x
  • Set ANTHROPIC_API_KEY env var
  • First call: use Messages API with max_tokens required
  • Enable prompt caching on repeated system prompts for 90%+ savings

OpenAI

  • pip install openai
  • Set OPENAI_API_KEY
  • Use openai>=1.0
  • First call with chat.completions.create

xAI/Grok

  • Use OpenAI SDK
  • Set base_url to https://api.x.ai/v1
  • Use XAI_API_KEY or same key name
  • Start with grok-4.1-fast for cost-sensitive work before moving to flagship

Common Integration Gotchas

  • Rate limits are per-tier and spend-based on xAI and OpenAI. Monitor headers or console.
  • Anthropic requires max_tokens on every call. No default.
  • Token counts differ slightly across providers. Never assume 1:1 cost transfer.
  • Streaming responses consume rate limit the same as non-streaming. Plan accordingly.
  • Tool calling schemas must match provider expectations exactly or the call fails silently on structure.
  • Long context costs more on some models. Stay under the pricing threshold when possible (check current docs).
  • Always implement token usage logging. Costs add up faster than you expect on production traffic.

Use the cheapest model that clears your quality bar. Track real costs per feature. Retry logic belongs in one shared client wrapper. Test structured output with real schemas before shipping. The signal chain here's prompt to tokens to dollars. Get the middle part right and the rest gets easier.

Related Guides
what are ai reasoning tokens: hidden compute costs
what are ai reasoning tokens? Hidden chain-of-thought computations in OpenAI o3 and DeepSeek R1 multiply costs 5-20x during test-time compute.
FPGA vs Microcontroller: Which Runs Your Smart Home Hub
FPGA vs Microcontroller: Which Runs Your Smart Home Hub. MCUs are preferred for lower cost, simpler updates, and better power in smart home hubs.
Zigbee vs Z-Wave: The Protocols Running Your Smart Home
Zigbee vs Z-Wave: The Protocols Running Your Smart Home. Key tradeoffs in mesh behavior, RF reliability, MCU overhead for smart home scaling.