Aller au contenu

Blog

Welcome to my blog!

I share field notes regularly — thoughts, discoveries, and hands-on experience in artificial intelligence, a field that keeps evolving with plenty left to explore.

Why did I start this blog? As a freelance Data Scientist and AI consultant, I work daily on real client problems that I solve using AI. Through that process, I learn a lot — about business challenges, about how to integrate AI in ways that actually fit each context, and about making AI more accessible to everyone.

My goal through these articles is to demystify AI concepts, share concrete technical solutions, offer lessons from real projects, and give practical advice for integrating artificial intelligence into your professional work.

Happy reading, and feel free to reach out: anas@tensoria.fr!

You can also subscribe to my newsletter :)

Multi-Agent Systems: What Actually Works

Multi-agent systems are usually the first architecture people reach for. Specialized agents, an orchestrator that distributes tasks, clean hand-offs between roles. On paper, it looks elegant.

In production, it is a different story.

According to the MAST study published by UC Berkeley in March 2025, based on 1,600 execution traces, multi-agent systems fail between 41% and 86.7% of the time depending on the framework. And when they fail, the problem rarely comes from the model itself: it comes from the architecture.

Here is what the data actually says, and how to decide whether you need multiple agents or one well-equipped single agent.


CrewAI vs LangGraph vs Pydantic AI : Honest 2026 Pick

Every three months, a new AI agent framework drops and makes the front page of Reddit and Hacker News. CrewAI. LangGraph. AutoGen. Pydantic AI. Smolagents. And now Mastra, Agno, Letta, OpenAI Agents SDK, Inferable... The list grows every quarter.

The question everyone asks: which one should I pick?

The trap is believing there's a "best framework." The truth is that these tools don't target the same audience. And some of them are genuinely not built for serious data scientists who want to understand, optimize, and control what they build.

In this article, I'll walk through the five main frameworks — their real strengths, their concrete weaknesses, and who each one is honestly suited for. Plus a few outsiders worth knowing. And a direct recommendation on what I actually use on client engagements.


Embeddings in RAG : What They Are & Why They Matter

No embeddings, no ChatGPT answering questions about your documents. No semantic search that finds an article even when you type synonyms. No AI agent that remembers what you told it last week.

Embeddings are the foundational building block of all modern AI. And yet, in the vast majority of projects I work on, they're the least well understood component. Teams use them — often without really knowing why — and then wonder why results are disappointing.

In this article, I'll explain what embeddings actually are, how they work at a high level, why they matter so much, how to choose the right model in 2026, and the concrete pitfalls to avoid. Whether you're a manager or a developer, you should come away with a solid understanding of the topic.


The 7 Wrong Reflexes of RAG Teams (and How to Fix Them)

Introduction

When a RAG project stalls, it's almost never because of a missing technology. It's because of a chain of counter-productive reflexes that teams adopt without realizing it. You tweak the prompt when the problem is in the retrieval. You call it "working" after four manual tests. You stack advanced techniques before you've understood where things are actually breaking.

After roughly twenty RAG projects in consulting and audit engagements, I keep running into the same 7 reflexes. These aren't technical mistakes. They're cognitive biases. But they sabotage performance just as reliably as bad chunking. Here's the list, with the replacement reflex for each one.


AI Agent Memory : How It Really Works in Production

Without memory, an AI agent is just a better chatbot.

With poorly designed memory, it's an agent that fabricates recollections, contradicts what it said last week, and costs you a fortune in tokens. Memory is the most underestimated feature of AI agents in 2026. And it's the one that separates a fun prototype from a product that actually creates value.

In this article I'll walk you through the real taxonomy of memory in AI agents, the core technical pattern that very few people explain clearly (a small dedicated LLM that filters what's worth keeping), the tools on the market with their actual benchmark numbers, and how to choose based on your use case.


PDF Parsing for RAG : Extract Data That Actually Works

The problem nobody wants to face

8 out of 10 RAG systems that fail in production have a parsing problem upstream. Not a model issue, not a prompt issue, not a retriever issue. Just a PDF that was read badly from the start.

That's the pattern I see on almost every project I work on. A company spends weeks choosing its language model, configuring its vector database, tuning its prompts — and the system still misses the mark. Because the source document was misread right at the beginning.

Parsing (structured data extraction from a document) is the most underestimated step in any RAG pipeline. If your information retrieval from source files is approximate, the sophistication of everything else doesn't matter — you're building on sand. A badly extracted table, confused columns, an ignored technical diagram — and your LLM generates confidently wrong answers.

In this article, I'll show you why document structuring is so hard, how the 4 major tools on the market actually compare, and what I learned across two very different projects: factory documentation at Continental, and an e-commerce site with thousands of product pages.


Evaluate RAG in Production : Metrics, RAGAS & Audit

80% of the RAGs I audit have no evaluation system

That's a number I wish I could back with an academic citation. But it comes straight from the field: of the production RAG systems I've audited over the past two years, roughly 8 out of 10 have no structured evaluation system in place.

The pattern is always the same. The project gets shipped. The team "checked it manually" on 10 or 15 questions during QA. User feedback seems fine. And then nobody measures anything again.

The hidden cost of this gap is enormous. You don't know if the RAG is drifting after a document update. You don't know if a change in your embedding model broke something. You don't know whether the improvements you're making are actually gains, or just compensating for a regression somewhere else. You're optimizing blind.

This is the single biggest thing that separates a RAG proof-of-concept from a mature production system. A POC "works". A production system gets measured, monitored, and improved in a controlled way. This article covers the RAG metrics that actually matter, evaluation frameworks (RAGAS, DeepEval, TruLens), how to build a solid evaluation dataset, and how to set up continuous evaluation in production.


MCP (Model Context Protocol): Connect AI Agents to Any Tool

Everyone talks about AI agents. Nobody talks about how they actually connect to your tools.

Here's the concrete problem 90% of teams face when they want to build a serious AI agent: they have a capable LLM, a clear use case, and 4 or 5 tools to connect (a SQL database, Slack, Notion, GitHub). And then they end up writing a custom integration for each tool, for each model. If they switch LLMs tomorrow, they start over. If a colleague wants to reuse the Slack integration on a different agent, they start from scratch.

This is the N times M problem. N agents, M tools. You end up with N×M integrations to write and maintain.

MCP solves exactly this problem. The Model Context Protocol is the open standard launched by Anthropic in November 2024, and in 2026 it's becoming what HTTP is to the web: the invisible infrastructure everything runs on. OpenAI, Google, Microsoft, AWS — the entire ecosystem is converging on it. 97 million monthly SDK downloads in March 2026, up from 2 million at launch. That's unprecedented adoption for an AI tooling standard.

In this article, I'll explain what MCP actually is, how its architecture works, how it differs from classic function calling, and most importantly: which projects to use it on (and which ones not to).


RAG vs Fine-Tuning vs Training from Scratch: The Real Costs

Introduction

A question comes up frequently when I work with companies on their AI projects: "Should we train our own model?". Or the slightly more advanced variant: "We want to fine-tune a model on our data".

Every time, I need to take a moment to explain what that actually means in practice. Because between training a model from scratch, fine-tuning it on your own data, or simply giving it context with a RAG, there is a world of difference. In cost, time, complexity, and above all in outcome.

In this article, I will try to lay things out simply. What is an AI model, how do you train it, what does it cost, when is it worth it, and most importantly why in 95% of cases you probably do not need to do either.


How to Optimize RAG: 8 Techniques with Measured Gains

You're probably optimizing in the wrong place

When a RAG isn't working well, here's what 90% of teams do: they change the prompt.

They rephrase the instructions, try different models, adjust the temperature. And sometimes it helps a little. But most of the time, that's not where the problem is.

Jason Liu, one of the most followed RAG experts, has a framing I find spot-on: "Before touching anything, reach 97% recall in retrieval."

97% recall means that in 97 out of 100 cases, the chunk containing the right answer is among the results you pass to the LLM. If you're not there, the best prompt in the world won't change a thing. The LLM cannot invent information that isn't in its context.

The real RAG optimization order is: measure first, then retrieval, then generation. Not the other way around. If you're not yet familiar with the basics of how RAG works, start there before optimizing any component.