Aller au contenu

Blog

Long Context vs RAG in 2026: When to Use Which?

Introduction

Every time a model ships with a larger context window, the debate resurfaces: "RAG is dead, just put everything in the context." In 2026, Gemini 3.1 Pro pushes up to 2 million tokens, Claude and GPT hold at 1M. The question is legitimate.

But in the field, it's not that simple. I've seen teams burn thousands of euros on API calls thinking they were "simplifying" their stack by dropping RAG. I've also seen teams build a full RAG pipeline to answer questions on 3 pages of documentation. In both cases, they were using the wrong tool for the problem.

AI Agent Memory : How It Really Works in Production

Without memory, an AI agent is just a better chatbot.

With poorly designed memory, it's an agent that fabricates recollections, contradicts what it said last week, and costs you a fortune in tokens. Memory is the most underestimated feature of AI agents in 2026. And it's the one that separates a fun prototype from a product that actually creates value.

In this article I'll walk you through the real taxonomy of memory in AI agents, the core technical pattern that very few people explain clearly (a small dedicated LLM that filters what's worth keeping), the tools on the market with their actual benchmark numbers, and how to choose based on your use case.

PDF Parsing for RAG : Extract Data That Actually Works

The problem nobody wants to face

8 out of 10 RAG systems that fail in production have a parsing problem upstream. Not a model issue, not a prompt issue, not a retriever issue. Just a PDF that was read badly from the start.

That's the pattern I see on almost every project I work on. A company spends weeks choosing its language model, configuring its vector database, tuning its prompts — and the system still misses the mark. Because the source document was misread right at the beginning.

Parsing (structured data extraction from a document) is the most underestimated step in any RAG pipeline. If your information retrieval from source files is approximate, the sophistication of everything else doesn't matter — you're building on sand. A badly extracted table, confused columns, an ignored technical diagram — and your LLM generates confidently wrong answers.

In this article, I'll show you why document structuring is so hard, how the 4 major tools on the market actually compare, and what I learned across two very different projects: factory documentation at Continental, and an e-commerce site with thousands of product pages.

AI Agent vs n8n, Make, Zapier: Which One for Your Business?

6 out of 10 businesses asking for an AI agent don't need one

Out of every 10 businesses that contact me saying "we want an AI agent," 6 don't actually need one.

They need a solid n8n workflow with a well-configured OpenAI node. And nobody tells them that. Because custom agencies would rather sell a 30K€ AI agent than a 3K€ n8n workflow. Which is understandable. But it isn't honest.

So in this article, I'll share the decision framework I use with my own clients. No sales pitch. Just the criteria that tell you whether you need an AI agent vs n8n (or Make, or Zapier) for your specific situation.

My position from the start: in the majority of SMB cases, a well-built n8n workflow with an LLM node is enough. A custom AI agent is only necessary in specific situations. I'll show you exactly which ones.

Evaluate RAG in Production : Metrics, RAGAS & Audit

80% of the RAGs I audit have no evaluation system

That's a number I wish I could back with an academic citation. But it comes straight from the field: of the production RAG systems I've audited over the past two years, roughly 8 out of 10 have no structured evaluation system in place.

The pattern is always the same. The project gets shipped. The team "checked it manually" on 10 or 15 questions during QA. User feedback seems fine. And then nobody measures anything again.

The hidden cost of this gap is enormous. You don't know if the RAG is drifting after a document update. You don't know if a change in your embedding model broke something. You don't know whether the improvements you're making are actually gains, or just compensating for a regression somewhere else. You're optimizing blind.

This is the single biggest thing that separates a RAG proof-of-concept from a mature production system. A POC "works". A production system gets measured, monitored, and improved in a controlled way. This article covers the RAG metrics that actually matter, evaluation frameworks (RAGAS, DeepEval, TruLens), how to build a solid evaluation dataset, and how to set up continuous evaluation in production.

MCP (Model Context Protocol): Connect AI Agents to Any Tool

Everyone talks about AI agents. Nobody talks about how they actually connect to your tools.

Here's the concrete problem 90% of teams face when they want to build a serious AI agent: they have a capable LLM, a clear use case, and 4 or 5 tools to connect (a SQL database, Slack, Notion, GitHub). And then they end up writing a custom integration for each tool, for each model. If they switch LLMs tomorrow, they start over. If a colleague wants to reuse the Slack integration on a different agent, they start from scratch.

This is the N times M problem. N agents, M tools. You end up with N×M integrations to write and maintain.

MCP solves exactly this problem. The Model Context Protocol is the open standard launched by Anthropic in November 2024, and in 2026 it's becoming what HTTP is to the web: the invisible infrastructure everything runs on. OpenAI, Google, Microsoft, AWS — the entire ecosystem is converging on it. 97 million monthly SDK downloads in March 2026, up from 2 million at launch. That's unprecedented adoption for an AI tooling standard.

In this article, I'll explain what MCP actually is, how its architecture works, how it differs from classic function calling, and most importantly: which projects to use it on (and which ones not to).

RAG vs Fine-Tuning vs Training from Scratch: The Real Costs

Introduction

A question comes up frequently when I work with companies on their AI projects: "Should we train our own model?". Or the slightly more advanced variant: "We want to fine-tune a model on our data".

Every time, I need to take a moment to explain what that actually means in practice. Because between training a model from scratch, fine-tuning it on your own data, or simply giving it context with a RAG, there is a world of difference. In cost, time, complexity, and above all in outcome.

In this article, I will try to lay things out simply. What is an AI model, how do you train it, what does it cost, when is it worth it, and most importantly why in 95% of cases you probably do not need to do either.

How to Optimize RAG: 8 Techniques with Measured Gains

You're probably optimizing in the wrong place

When a RAG isn't working well, here's what 90% of teams do: they change the prompt.

They rephrase the instructions, try different models, adjust the temperature. And sometimes it helps a little. But most of the time, that's not where the problem is.

Jason Liu, one of the most followed RAG experts, has a framing I find spot-on: "Before touching anything, reach 97% recall in retrieval."

97% recall means that in 97 out of 100 cases, the chunk containing the right answer is among the results you pass to the LLM. If you're not there, the best prompt in the world won't change a thing. The LLM cannot invent information that isn't in its context.

The real RAG optimization order is: measure first, then retrieval, then generation. Not the other way around. If you're not yet familiar with the basics of how RAG works, start there before optimizing any component.

Optimal RAG Chunking : 8 Strategies & Real Benchmarks

The chunking you're probably using is the worst one tested

Let me start with a result that surprised me when I first saw it.

Chroma Research published a benchmark comparing all common chunking strategies. They tested the default OpenAI Assistants parameters: 800 tokens, 400 tokens of overlap. Their verdict is unambiguous — it's the configuration with the lowest precision across all tests. 1.4% precision. Their exact comment: "particularly poor recall-efficiency tradeoffs".

These are the parameters tens of thousands of projects are using right now, often because it's what the LangChain or LlamaIndex quick start suggests.

Meanwhile, configurations 4x simpler (200 tokens, zero overlap) perform 3.7x better on precision.

Chunking is the decision most teams spend the least time on. And yet it's probably the one with the highest impact on your RAG quality.

Hybrid RAG : BM25 + Vector Search With +10% Recall

Your vector RAG is missing questions you don't even know about

It's a comment I hear often on RAG projects: "It works well in general, but sometimes it finds nothing on questions that seem straightforward."

Concrete example: "What is the ISO-27001 procedure for remote access?" → 0 relevant results.

Vector search encodes meaning. But when a query contains an exact identifier — a standard name, a product code, a domain acronym — semantic encoding fails completely.

This is what's called vocabulary mismatch. And it's the problem hybrid search solves.