Jump to content
Jump to content
✓ Done
/resources · Decision Guide

RAG vs Fine-Tuning vs Agents Decision Guide

Reviewed by Josh Ausmus · Updated April 2026

Download PDF ↓

Decision Tree (Text Flowchart)

Can better prompting solve this?
├── Yes → Stop here. Use advanced prompting + chain-of-thought + few-shot.
│
└── No
 ├── Does it need up-to-date or external facts/knowledge?
 │ ├── Yes → Use RAG (or RAG + prompting).
 │ │
 │ └── No
 │ ├── Does it need specialized behavior, style, or consistent output format on fixed tasks?
 │ │ ├── Yes → Use fine-tuning.
 │ │
 │ └── No
 │ └── Does it require multi-step reasoning, tool use, or autonomous decisions?
 │ ├── Yes → Use agents (or agents + RAG).
 │ └── No → Combine prompting + RAG first. Re-evaluate.

Comparison Table

Approach Cost to start Ongoing cost Latency Accuracy ceiling Setup time Maintenance Best for
Better Prompting $0 API tokens only Lowest Medium (hallucinations persist) Hours Low (prompt tweaks) Prototypes, simple tasks, formatting
RAG Low (vector DB) Storage + retrieval + tokens Medium High on knowledge tasks Days Medium (data updates, chunking) Knowledge bases, docs, Q&A with citations
Fine-Tuning Medium Training compute + higher inference Lowest after training Highest on narrow tasks Weeks High (retrain on drift) Domain-specific style, JSON output, tone
Agents Medium-High Tokens × steps + tool calls Highest High on complex workflows Weeks High (debug loops, tool reliability) Tool-using workflows, multi-step planning

Signs You Need RAG (6 items)

  • Knowledge changes often. Retraining is too slow.
  • You must cite sources or avoid hallucinations on facts.
  • Dataset is large (thousands of docs) and mostly static.
  • Users ask about specific products, policies, or manuals.
  • You need to keep data private but still query it.
  • Prompting works until the model makes up details.

Signs You Need Fine-Tuning (6 items)

  • Output must follow a rigid format (JSON, XML) every time.
  • You want a consistent brand voice or writing style.
  • Task is narrow and repetitive with clear input-output pairs.
  • Inference latency matters more than update frequency.
  • You have 1K+ high-quality labeled examples.
  • Model ignores instructions even after heavy prompting.

Signs You Need Agents (6 items)

  • Task involves multiple steps with conditional branching.
  • You need to call external tools or APIs during reasoning.
  • Goal is complex . Research, booking, code debugging.
  • Single prompt fails but breaking into steps succeeds.
  • You accept variable latency for better outcomes.
  • Hallucination in planning is tolerable with verification steps.

Cost Comparison Table . Fine-Tuning at Different Scales

Approximate training cost in USD, and Assumes ~500 tokens per example average. OpenAI uses per-million-token training fees (roughly $8 - 25/M tokens processed depending on base model). Anthropic currently lacks public fine-tuning (API-only focus). Open-source uses LoRA on rented GPUs.

Scale OpenAI (e.g. GPT-5.4-mini class) Anthropic Open Source . LoRA on 70B, e.g. via Together/Fireworks
1K examples ~$20 - 80 N/A $5 - 30 (few hours on A100)
10K examples ~$200 - 800 N/A $50 - 300 (1 - 2 days)
100K examples ~$2K - 8K N/A $500 - 3K (multi-day or distributed)

Ongoing inference for fine-tuned models runs 2 - 8× base API cost. Self-hosted open-source drops to hardware cost after training.[[1]]. Https://pricepertoken.com/fine-tuning

Common Failure Modes

Better Prompting

  • Model still hallucinates facts.
  • Prompt grows beyond context window.
  • Inconsistent output across similar inputs.
  • Breaks when instructions get complex.

RAG

  • Bad chunking or embeddings → irrelevant retrieval.
  • No citations or lost-in-the-middle problem.
  • Vector DB gets stale without update pipeline.
  • High token use from long retrieved context.

Fine-Tuning

  • Catastrophic forgetting of general capabilities.
  • Overfits to training data, poor on edge cases.
  • Expensive to retrain when data drifts.
  • Data preparation takes longer than expected.

Agents

  • Loops forever or gets stuck in reasoning.
  • Tool calls fail silently or with bad parameters.
  • Latency explodes with more steps.
  • Hard to debug without full trace logging.

Use prompting first. Add RAG when knowledge is the gap. Fine-tune when behavior is the gap. Deploy agents only when the workflow demands planning and tools. Most production wins come from good prompting plus RAG. The rest is usually overkill until scale proves otherwise.

Related Guides
what are ai reasoning tokens: hidden compute costs
what are ai reasoning tokens? Hidden chain-of-thought computations in OpenAI o3 and DeepSeek R1 multiply costs 5-20x during test-time compute.
FPGA vs Microcontroller: Which Runs Your Smart Home Hub
FPGA vs Microcontroller: Which Runs Your Smart Home Hub. MCUs are preferred for lower cost, simpler updates, and better power in smart home hubs.
Zigbee vs Z-Wave: The Protocols Running Your Smart Home
Zigbee vs Z-Wave: The Protocols Running Your Smart Home. Key tradeoffs in mesh behavior, RF reliability, MCU overhead for smart home scaling.