How AI Agents Work Under the Hood
ReAct (thought-action-observation loop) The agent alternates between generating a thought, selecting an action or tool, executing it, and incorporating the observation back into context. It repeats this loop until it has enough information to answer. This handles dynamic tasks where outcomes are uncertain. ReAct grounds the model in real feedback instead of hallucinated steps.[1]
Plan-and-Execute (upfront planning then execution) The model first creates a complete step-by-step plan, then executes each step in sequence with minimal replanning. It trades adaptability for predictability and lower per-step token use. Good when the task decomposes cleanly and the environment stays stable. It fails harder if the initial plan misses key details.[1]
Tool Calling / Function Calling (structured tool use) The LLM outputs structured JSON or schema-compliant calls that the framework routes to predefined functions or APIs. The system parses the call, runs the tool, and injects the result back into the conversation. This gives reliable, typed interactions without free-form text parsing. Most modern models ship with native support for this.[2]
Code Execution Agents (sandboxed code running) The agent writes Python or other code, sends it to a sandboxed interpreter, and receives stdout, results, or errors. It iterates by fixing or extending the code based on output. This works well for math, data analysis, or procedural tasks. The sandbox must limit CPU, memory, and network access or it becomes dangerous.
Multi-Agent: Supervisor pattern One coordinator agent decomposes the task, assigns subtasks to specialized worker agents via routing or tool calls, then synthesizes their outputs. Workers run independently with narrow tools and scopes. This scales better than one monolithic agent on complex jobs. The supervisor becomes a bottleneck if coordination logic grows too complex.[1]
Multi-Agent: Debate pattern Multiple agents take opposing or varied positions, critique each other's outputs in rounds, and a moderator or judge synthesizes the best conclusion. It surfaces flaws and improves reasoning on ambiguous or high-stakes topics. Communication overhead adds latency and tokens. Works when you can afford multiple inference passes.[3]
Multi-Agent: Pipeline pattern Agents run in fixed sequential or staged order. Each takes the prior agent's output as input and performs one specialized transformation. Simple to debug and trace. Rigid structure limits adaptation if early stages produce poor data. Common for content generation or data processing workflows.[4]
Memory: Context window (short-term) The raw token buffer that holds the current conversation, recent thoughts, tool outputs, and system prompt. Everything in the window influences the next generation directly. Hits hard limits fast on long interactions. Most frameworks truncate or summarize when it fills.
Memory: Vector store (long-term semantic) Embeddings of past interactions, documents, or facts are stored in a vector database and retrieved by similarity when the current query matches. This pulls relevant history without blowing up the context window. Retrieval quality depends on chunking and embedding model. Adds latency for the query step.
Memory: Conversation history (session tracking) Persistent log of user messages, agent responses, and key decisions across turns or sessions. Often stored in a database and selectively injected or summarized into the context window. Handles multi-turn coherence better than raw context alone. Requires summarization logic for very long sessions.
Pattern Comparison
| Pattern | Best For | Weakness | Example Framework |
|---|---|---|---|
| ReAct | Dynamic tasks, tool exploration | High token use, unpredictable cost | LangChain, LlamaIndex |
| Plan-and-Execute | Structured, predictable workflows | Brittle if plan is wrong | LangGraph, AutoGen |
| Tool Calling | Reliable API/tool integration | Limited to predefined schemas | OpenAI, Anthropic |
| Code Execution | Math, data, procedural generation | Sandbox escape risk, slow loops | E2B, LangChain code |
| Supervisor (Multi) | Task routing across specialties | Coordinator bottleneck | CrewAI, LangGraph |
| Debate (Multi) | Reasoning, fact checking | High latency and cost | AutoGen, custom |
| Pipeline (Multi) | Linear multi-stage processing | No adaptation to early errors | LangGraph, CrewAI |
| Context Window | Immediate short-term state | Strict length limits | All LLMs |
| Vector Store | Long-term knowledge retrieval | Retrieval quality varies | Pinecone, Chroma, Redis |
| Conversation History | Session coherence | Needs summarization at scale | LangChain memory |
If your task has high uncertainty or changing data, start with ReAct or supervisor. Stable pipelines reward Plan-and-Execute or strict sequential agents. Memory choices matter more than the reasoning loop once sessions exceed a few thousand tokens. The real system is usually a hybrid.