Jump to content
Jump to content
✓ Done
/resources · Reference

AI Agent Architecture Reference

Reviewed by Josh Ausmus · Updated April 2026

Live reference · updated continuously

How AI Agents Work Under the Hood

ReAct (thought-action-observation loop) The agent alternates between generating a thought, selecting an action or tool, executing it, and incorporating the observation back into context. It repeats this loop until it has enough information to answer. This handles dynamic tasks where outcomes are uncertain. ReAct grounds the model in real feedback instead of hallucinated steps.[1]

Plan-and-Execute (upfront planning then execution) The model first creates a complete step-by-step plan, then executes each step in sequence with minimal replanning. It trades adaptability for predictability and lower per-step token use. Good when the task decomposes cleanly and the environment stays stable. It fails harder if the initial plan misses key details.[1]

Tool Calling / Function Calling (structured tool use) The LLM outputs structured JSON or schema-compliant calls that the framework routes to predefined functions or APIs. The system parses the call, runs the tool, and injects the result back into the conversation. This gives reliable, typed interactions without free-form text parsing. Most modern models ship with native support for this.[2]

Code Execution Agents (sandboxed code running) The agent writes Python or other code, sends it to a sandboxed interpreter, and receives stdout, results, or errors. It iterates by fixing or extending the code based on output. This works well for math, data analysis, or procedural tasks. The sandbox must limit CPU, memory, and network access or it becomes dangerous.

Multi-Agent: Supervisor pattern One coordinator agent decomposes the task, assigns subtasks to specialized worker agents via routing or tool calls, then synthesizes their outputs. Workers run independently with narrow tools and scopes. This scales better than one monolithic agent on complex jobs. The supervisor becomes a bottleneck if coordination logic grows too complex.[1]

Multi-Agent: Debate pattern Multiple agents take opposing or varied positions, critique each other's outputs in rounds, and a moderator or judge synthesizes the best conclusion. It surfaces flaws and improves reasoning on ambiguous or high-stakes topics. Communication overhead adds latency and tokens. Works when you can afford multiple inference passes.[3]

Multi-Agent: Pipeline pattern Agents run in fixed sequential or staged order. Each takes the prior agent's output as input and performs one specialized transformation. Simple to debug and trace. Rigid structure limits adaptation if early stages produce poor data. Common for content generation or data processing workflows.[4]

Memory: Context window (short-term) The raw token buffer that holds the current conversation, recent thoughts, tool outputs, and system prompt. Everything in the window influences the next generation directly. Hits hard limits fast on long interactions. Most frameworks truncate or summarize when it fills.

Memory: Vector store (long-term semantic) Embeddings of past interactions, documents, or facts are stored in a vector database and retrieved by similarity when the current query matches. This pulls relevant history without blowing up the context window. Retrieval quality depends on chunking and embedding model. Adds latency for the query step.

Memory: Conversation history (session tracking) Persistent log of user messages, agent responses, and key decisions across turns or sessions. Often stored in a database and selectively injected or summarized into the context window. Handles multi-turn coherence better than raw context alone. Requires summarization logic for very long sessions.

Pattern Comparison

Pattern Best For Weakness Example Framework
ReAct Dynamic tasks, tool exploration High token use, unpredictable cost LangChain, LlamaIndex
Plan-and-Execute Structured, predictable workflows Brittle if plan is wrong LangGraph, AutoGen
Tool Calling Reliable API/tool integration Limited to predefined schemas OpenAI, Anthropic
Code Execution Math, data, procedural generation Sandbox escape risk, slow loops E2B, LangChain code
Supervisor (Multi) Task routing across specialties Coordinator bottleneck CrewAI, LangGraph
Debate (Multi) Reasoning, fact checking High latency and cost AutoGen, custom
Pipeline (Multi) Linear multi-stage processing No adaptation to early errors LangGraph, CrewAI
Context Window Immediate short-term state Strict length limits All LLMs
Vector Store Long-term knowledge retrieval Retrieval quality varies Pinecone, Chroma, Redis
Conversation History Session coherence Needs summarization at scale LangChain memory

If your task has high uncertainty or changing data, start with ReAct or supervisor. Stable pipelines reward Plan-and-Execute or strict sequential agents. Memory choices matter more than the reasoning loop once sessions exceed a few thousand tokens. The real system is usually a hybrid.

Ai Agent Development Cost Breakdown In 2026
ai agent development cost breakdown: costs $20k-$200k+ in 2026. Initial build 25-35% of 3yr total spend; 65-75% in tokens, monitoring, maintenance & governance.
What Are Ai Reasoning Tokens And Their Hidden Costs
what are ai reasoning tokens? They are the internal chain-of-thought steps a model generates to work through a problem before producing the visible output.
Embedded Systems Explained: What’s Actually Running Your Security Camera
Embedded Systems Explained: What's Actually Running Your Security Camera. It reveals the processor, memory, and firmware that make IP cameras work reliably.