LLM API Integration Reference 2026

/resources · Reference

Reviewed by Josh Ausmus · Updated April 2026

Provider Comparison

Provider	Auth Method	Base URL	Flagship Models	Rate Limits (typical Tier 2)	Pricing per 1M tokens (input/output)
Anthropic	x-api-key	https://api.anthropic.com	claude-opus-4.6, claude-sonnet-4.6	~1k RPM, tiered TPM	$5/$25 (Opus), $3/$15 (Sonnet)
OpenAI	Bearer token	https://api.openai.com/v1	gpt-5.4, gpt-5.4-mini	Tier-based, spend scaled	$2.50/$15 (gpt-5.4)
xAI/Grok	Bearer token	https://api.x.ai/v1	grok-4.20, grok-4.1-fast	Tier by cumulative spend	$2-3/$6-15 (flagship), $0.20/$0.50 (fast)

Basic Completion

# Python - OpenAI compatible (works for OpenAI and xAI)
from openai import OpenAI
client = OpenAI. Api_key="sk-...", base_url="https://api.x.ai/v1" # change base for xAI
resp = client.chat.completions.create. Model="grok-4.20",
 messages=[{"role": "user", "content": "learn more PID loops."}],
 temperature=0.7,
 max_tokens=512
print. Resp.choices[0].message.content

// TypeScript - OpenAI compatible
import OpenAI from 'openai';
const client = new OpenAI. {
 apiKey: process.env.XAI_API_KEY,
 baseURL: 'https://api.x.ai/v1'
};
const resp = await client.chat.completions.create. {
 model: 'grok-4.20',
 messages: [{ role: 'user', content: 'Explain PID loops.' }],
 temperature: 0.7,
 max_tokens: 512
};
console.log. Resp.choices[0].message.content;

Anthropic Messages (Python + TS)

# Python Anthropic SDK 0.88.x
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
resp = client.messages.create. Model="claude-sonnet-4.6",
 max_tokens=1024,
 messages=[{"role": "user", "content": "Explain PID loops."}]
print(resp.content[0].text)

// TypeScript Anthropic
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic. { apiKey: process.env.ANTHROPIC_API_KEY };
const resp = await client.messages.create. {
 model: "claude-sonnet-4.6",
 max_tokens: 1024,
 messages: [{ role: "user", content: "Explain PID loops." }]
};
console.log(resp.content[0].text);

Streaming

Use stream=True for OpenAI-compatible. For Anthropic use .stream() or stream=True.

Tool Calling

OpenAI/xAI style (function tools in the request). Anthropic uses tools array with input_schema. The pattern is similar but schema format differs slightly. Test both in the same codebase with a thin wrapper.

Structured Output (JSON mode)

# OpenAI / xAI
resp = client.chat.completions.create. Model="gpt-5.4",
 messages=[...],
 response_format={"type": "json_object"},
 tools=[{"type": "function", "function": {...}}] # or json_schema

Anthropic supports tool_choice with JSON schema tools for guaranteed structure.

Error Handling: Retry with Exponential Backoff

import time
from openai import RateLimitError, APIConnectionError, APITimeoutError

def call_with_retry. Client, **kwargs, max_retries=5:
 for attempt in range(max_retries):
 try:
 return client.chat.completions.create(**kwargs)
 except RateLimitError:
 wait = (2 ** attempt) * 0.5 + 0.1 # jitter
 time.sleep(wait)
 except . APIConnectionError, APITimeoutError:
 time.sleep(2 ** attempt)
 except Exception as e:
 if "timeout" in str(e).lower():
 time.sleep(1)
 else:
 raise
 raise Exception("Max retries exceeded")

Cost Tracking Middleware Pattern

Wrap the client call. Track input/output tokens from response.usage. Log or send to your observability layer (Prometheus, custom DB). Add a context manager that accumulates per-request and per-session totals. For Anthropic read from usage dict on the response.

Setup Checklist

Anthropic

pip install anthropic==0.88.* or latest 0.88.x
Set ANTHROPIC_API_KEY env var
First call: use Messages API with max_tokens required
Enable prompt caching on repeated system prompts for 90%+ savings

OpenAI

pip install openai
Set OPENAI_API_KEY
Use openai>=1.0
First call with chat.completions.create

xAI/Grok

Use OpenAI SDK
Set base_url to https://api.x.ai/v1
Use XAI_API_KEY or same key name
Start with grok-4.1-fast for cost-sensitive work before moving to flagship

Common Integration Gotchas

Rate limits are per-tier and spend-based on xAI and OpenAI. Monitor headers or console.
Anthropic requires max_tokens on every call. No default.
Token counts differ slightly across providers. Never assume 1:1 cost transfer.
Streaming responses consume rate limit the same as non-streaming. Plan accordingly.
Tool calling schemas must match provider expectations exactly or the call fails silently on structure.
Long context costs more on some models. Stay under the pricing threshold when possible (check current docs).
Always implement token usage logging. Costs add up faster than you expect on production traffic.

Use the cheapest model that clears your quality bar. Track real costs per feature. Retry logic belongs in one shared client wrapper. Test structured output with real schemas before shipping. The signal chain here's prompt to tokens to dollars. Get the middle part right and the rest gets easier.

what are ai reasoning tokens? Hidden chain-of-thought computations in OpenAI o3 and DeepSeek R1 multiply costs 5-20x during test-time compute.

FPGA vs Microcontroller: Which Runs Your Smart Home Hub

FPGA vs Microcontroller: Which Runs Your Smart Home Hub. MCUs are preferred for lower cost, simpler updates, and better power in smart home hubs.

Zigbee vs Z-Wave: The Protocols Running Your Smart Home

Zigbee vs Z-Wave: The Protocols Running Your Smart Home. Key tradeoffs in mesh behavior, RF reliability, MCU overhead for smart home scaling.