Jump to content
Jump to content
✓ Done
/resources · Reference

Prompt Engineering Formulas 2026

Reviewed by Josh Ausmus · Updated April 2026

Download PDF ↓

Chain-of-Thought vs Tree-of-Thought vs ReAct

Use the right reasoning pattern. CoT works for most linear problems. ToT explodes token use on hard search spaces. ReAct runs the loop when tools matter.

Technique When to Use Strengths Weaknesses
Chain-of-Thought (CoT) Math, logic, code analysis, single-path explanations Simple to prompt. Boosts accuracy 20-40% on reasoning tasks. Low token overhead. Works on Claude Sonnet 4.6 and Grok 4.20. No exploration of alternatives. Can get stuck in one wrong path.
Tree-of-Thought (ToT) Planning, puzzles, creative strategy with multiple branches Explores alternatives. Self-evaluates branches. Strong on Game-of-24 style tasks. High token cost. Needs multiple generations or search. Fragile in practice.
ReAct Tool-using agents, research, web search, multi-step execution Alternates Thought-Action-Observation. Grounds outputs in real results. Dominant pattern for agents in 2026. Requires tool infrastructure. Can loop or call too many tools.

Structured Output Patterns

Tool calling beats "output JSON" on frontier models. Claude Opus 4.6, Sonnet 4.6, Grok 4.20 and GPT-5.4 all support native tool calling. The model follows the schema exactly without extra parsing.

  • Tool calling (preferred): Define functions or tools in the API. Model calls them directly. Highest reliability. Lowest hallucinated JSON.
  • XML-style tags: Wrap output in or similar. Works when tool calling unavailable. Add "Respond only with valid XML."
  • Strict JSON with schema: Tell the model "Output only valid JSON matching this schema." Add few-shot example. Still produces occasional invalid JSON on edge cases.
  • Markdown tables: Fast for simple data. Less reliable for nested structures.

Tool calling wins. Everything else adds post-processing work.

System Prompt Structure Template

This template works across Claude Opus 4.6, Sonnet 4.6, Grok 4.20, and GPT-5.4. Place it at the start of every serious session.

You are an expert AI assistant.

Core rules:
- Think step by step before answering.
- Be precise and concise. No filler.
- If you need information, use tools before guessing.
- Cite sources when relevant.
- For code, prefer clean, idiomatic implementations.

Task: [specific task here]

Available tools: [list if any]

Output format: [JSON/Tool call/Markdown as required]

Begin.

The first paragraph sets role and rules. The second gives the actual task. This separation reduces drift on 1M context windows.

Temperature and Top-p Settings

Temperature controls randomness, and Top-p (nucleus) cuts the tail. Most 2026 API calls default to temperature 0.7 and top-p 0.95. Override deliberately.

Use Case Temperature Top-p Notes
Factual extraction, code review, JSON output 0.0 - 0.2 0.1 - 0.3 Maximum determinism. Claude and GPT-5.4 stay on rails.
General reasoning, analysis 0.3 - 0.6 0.5 - 0.8 Balanced. Grok 4.20 performs well here.
Creative writing, brainstorming 0.7 - 0.9 0.9 - 1.0 More variety. Avoid for anything that needs consistency.
Research synthesis with multiple paths 0.4 0.7 Then sample 3-5 times and pick best.

Run important tasks at temperature 0.0 first. Increase only when you want diversity.

Common Failure Modes

Symptom Cause Fix
Model ignores instructions or format Weak system prompt or buried rules Put rules in the system message. Repeat key constraints at the end of user prompt.
Hallucinated facts or citations No grounding or lazy recall Force tool use first. Add "Only use information from the provided context or tools."
Loops forever or excessive tool calls Poor ReAct formatting Limit max turns in agent loop. Add "After 3 actions, summarize and conclude."
Inconsistent JSON output Asked for free-form JSON instead of tool call Use native tool calling schemas. Never rely on text JSON for parsing.
Refuses task or over-refuses Overly strict safety tuning Rephrase as technical exercise. Claude Opus 4.6 still has some refusals on edge topics.
Loses track in long context Context stuffing without structure Use clear headings, XML tags, or summarization steps every 50k tokens.
Shallow reasoning on complex task No explicit CoT instruction Always start with "Think step by step and show your reasoning."

Code Review Prompt Template

Review the following code for bugs, performance issues, security problems, and maintainability.

Project context: [describe briefly]

Code:
```[language]
[paste code]

Analyze step by step:

  1. What does this code do?
  2. Identify any defects or edge cases.
  3. Suggest specific improvements with code examples.
  4. Rate overall quality 1-10.

Output only in this structure:

one paragraph overview bullet list bullet list with code snippets final score ```

Complex Multi-Step Reasoning Prompt Template

Solve this problem using Chain-of-Thought.

Problem: [full problem statement]

First, list all assumptions and clarify ambiguities.
Second, break the problem into independent subproblems.
Third, solve each subproblem completely.
Fourth, combine the results.
Fifth, verify the final answer against the original constraints.

Show all work. Flag any uncertainties.
Final answer must be in a boxed section at the end.

Research Synthesis Prompt Template

Synthesize the following sources into a coherent analysis.

Sources:
[source 1 title and key excerpts]
[source 2...]

Requirements:
- Identify points of agreement.
- Highlight contradictions and learn more why they exist.
- Note gaps in the available information.
- Provide a final weighted conclusion based on source quality and recency.

Be explicit about which claims come from which source. Do not invent facts.

Structured Data Extraction Prompt Template

Extract structured data from the following document.

Document:
[paste text]

Schema:
{
 "field1": "string or null",
 "field2": "enum value",
 ...
}

Rules:
- Use null for missing or uncertain fields.
- Extract exact values when possible. Do not infer.
- Output ONLY valid JSON matching the schema. No other text.

These templates reduce iteration time. Test once on your target model. The same prompt often needs minor tweaks between Claude Sonnet 4.6 and GPT-5.4. The difference usually sits in how strictly each handles output format.

Related Guides
what are ai reasoning tokens: hidden compute costs
what are ai reasoning tokens? Hidden chain-of-thought computations in OpenAI o3 and DeepSeek R1 multiply costs 5-20x during test-time compute.
FPGA vs Microcontroller: Which Runs Your Smart Home Hub
FPGA vs Microcontroller: Which Runs Your Smart Home Hub. MCUs are preferred for lower cost, simpler updates, and better power in smart home hubs.
Zigbee vs Z-Wave: The Protocols Running Your Smart Home
Zigbee vs Z-Wave: The Protocols Running Your Smart Home. Key tradeoffs in mesh behavior, RF reliability, MCU overhead for smart home scaling.