"Should we fine-tune?" is one of the most common, and most expensively answered, questions we hear in 2026. The honest answer for 90% of teams is "no, not yet." Here is the decision framework we use after shipping RAG, fine-tuning, and pure prompting across a dozen production AI projects.
The three approaches in one paragraph each
Prompting, talk to a frontier model (Claude, GPT, Gemini) with carefully crafted instructions and examples. Cheapest to iterate, hardest to scale to nuanced domains.
RAG (Retrieval-Augmented Generation), fetch relevant content from your own data store at query time and stuff it into the model's context. Bridges general intelligence with specific knowledge. Most production AI features should start here.
Fine-tuning, train a smaller model on your data to handle a specific task efficiently. Best for high-volume, narrow tasks where latency or cost matters more than general intelligence.
When to use what
- Use prompting when the task is general (writing, summarising, classifying common categories) and you don't have proprietary knowledge to inject.
- Use RAG when answers depend on your specific data, internal docs, customer history, product catalogue, knowledge base. This is most B2B use cases.
- Use fine-tuning when you have > 1M monthly calls on a narrow task and the unit economics demand a smaller, cheaper model. Or when you need consistent stylistic output (brand voice, legal phrasing).
The default that works for almost everyone
Frontier model + good retrieval (RAG) + smart caching. This stack handles 80% of production use cases at acceptable cost. We've shipped it for healthcare, legal, ecommerce, and POS clients.
The RAG mistakes to avoid
- Bad chunking. Splitting docs naively kills retrieval quality. Use semantic chunking and overlap.
- Mono-embeddings. One embedding model for everything is a trap. Match the model to your domain.
- No reranking. Initial retrieval is noisy. A rerank step (with a smaller LLM or a dedicated reranker) lifts answer quality dramatically.
- Stale data. Build an incremental indexing pipeline from day one; do not full-reindex nightly.
When fine-tuning is actually worth it
Three real scenarios where we have fine-tuned:
- A legal document classifier, 200k docs/day, 12 categories. Fine-tuned Haiku is 6× cheaper than prompted Sonnet at equal accuracy.
- A clinical note summariser where consistency of structure matters more than depth.
- A brand voice rewriter for a publisher, the prompt-engineered version drifted, the fine-tuned version is steady.
How to budget
For most products: start prompting → add RAG when domain-specific accuracy is needed → consider fine-tuning only if usage justifies it (typically 6–12 months in). Following this path keeps your AI investment ROI-positive from month one.
If you are planning AI for your product, talk to our AI team or browse our AI voice assistant case study.