Aller au contenu

AI

Training, Fine-Tuning, or RAG: How to Choose for Your AI?

Introduction

A question comes up frequently when I work with companies on their AI projects: "Should we train our own model?". Or the slightly more advanced variant: "We want to fine-tune a model on our data".

Every time, I need to take a moment to explain what that actually means in practice. Because between training a model from scratch, fine-tuning it on your own data, or simply giving it context with a RAG, there is a world of difference. In cost, time, complexity, and above all in outcome.

In this article, I will try to lay things out simply. What is an AI model, how do you train it, what does it cost, when is it worth it, and most importantly why in 95% of cases you probably do not need to do either.


Optimize Your RAG: The 8 Techniques That Make a Real Difference

You're probably optimizing in the wrong place

When a RAG isn't working well, here's what 90% of teams do: they change the prompt.

They rephrase the instructions, try different models, adjust the temperature. And sometimes it helps a little. But most of the time, that's not where the problem is.

Jason Liu, one of the most followed RAG experts, has a framing I find spot-on: "Before touching anything, reach 97% recall in retrieval."

97% recall means that in 97 out of 100 cases, the chunk containing the right answer is among the results you pass to the LLM. If you're not there, the best prompt in the world won't change a thing. The LLM cannot invent information that isn't in its context.

The real RAG optimization order is: measure first, then retrieval, then generation. Not the other way around.


RAG Chunking Strategies: The Definitive Guide to Optimal Chunking

The chunking you're probably using is the worst one tested

Let me start with a result that surprised me when I first saw it.

Chroma Research published a benchmark comparing all common chunking strategies. They tested the default OpenAI Assistants parameters: 800 tokens, 400 tokens of overlap. Their verdict is unambiguous — it's the configuration with the lowest precision across all tests. 1.4% precision. Their exact comment: "particularly poor recall-efficiency tradeoffs".

These are the parameters tens of thousands of projects are using right now, often because it's what the LangChain or LlamaIndex quick start suggests.

Meanwhile, configurations 4x simpler (200 tokens, zero overlap) perform 3.7x better on precision.

Chunking is the decision most teams spend the least time on. And yet it's probably the one with the highest impact on your RAG quality.