RAGMarch 31, 202610 min readBy Olatunde Adedeji

Why I Chose RAG Over Fine-Tuning

A deep dive on why retrieval-augmented generation was the right architectural choice for EviVault, with practical trade-offs around trust, updates, provenance, privacy, and maintainability.

Organizations rarely struggle to create knowledge. They struggle to find it, trust it, and use it quickly.

That gap is where many internal AI projects begin. A team has policy documents, onboarding guides, process manuals, technical write-ups, and scattered notes across folders and systems. The knowledge exists. The friction comes from access, verification, and speed.

When I started shaping EviVault Assistant, one architectural question surfaced early:

Should this system rely on fine-tuning, or should it rely on retrieval?

That question matters because it influences almost every other product decision. It affects how the system handles updates, how easily users can verify answers, how much operational complexity the platform absorbs, and how trustworthy the final experience feels.

For EviVault, I chose retrieval-augmented generation, not fine-tuning.

This was not because fine-tuning is unhelpful. It was because retrieval matched the problem better.

The problem I was solving

EviVault was designed as an internal document intelligence platform. Users upload operational files such as policy documents, onboarding materials, procedures, internal guides, and similar knowledge sources. The system extracts text, chunks it, embeds it, stores it in a vector database, retrieves relevant passages for a query, and returns an answer grounded in those passages.

That workflow immediately reveals the nature of the product. This is not primarily a style problem. It is not mainly a “teach the model how to sound like the company” problem. It is a knowledge access and trust problem.

The core product questions were:

How do I help users find the right information quickly?
How do I make the answer traceable to the source?
How do I keep the system useful when documents change?
How do I avoid polished but unsupported answers?

Those questions pushed the design toward retrieval from the start.

What RAG actually changes

Retrieval-augmented generation changes where knowledge lives.

With RAG, knowledge remains in the document layer. The model is still useful, but it is not expected to store the latest operational truth inside its weights. Instead, the system retrieves relevant passages at query time and uses them as explicit context for the answer.

At a high level, the pattern looks like this:

text
Question → Retrieve relevant chunks → Build context → Generate grounded answer

That flow sounds simple, but it changes the product in important ways.

When knowledge lives in the retrieval layer:

new documents can be indexed without retraining a model
outdated content can be removed from the searchable corpus
answers can point back to source files and excerpts
teams can inspect the evidence that shaped the answer
refusal becomes easier when strong support is missing

That last point matters more than many teams expect. A grounded system does not only answer better. It can also fail more honestly.

What fine-tuning optimizes for

Fine-tuning is useful. It is just useful for different things.

Fine-tuning generally makes more sense when you want to shape model behavior, output structure, tone, domain phrasing, or task-specific patterns. It can help a model become more consistent in how it writes, formats, classifies, or follows specific instructions.

That is different from using it as the primary mechanism for keeping operational knowledge current.

For internal document workflows, pushing changing knowledge into model weights creates practical tension:

updating knowledge becomes slower
provenance becomes harder to show
answer verification becomes less direct
retraining adds cost and operational friction
stale or mixed knowledge can remain hidden inside the model behavior

In other words, fine-tuning can improve how a model behaves, but it is often not the cleanest way to manage living internal knowledge.

That distinction shaped the EviVault architecture.

Why retrieval matched EviVault better

Three reasons made the decision clear.

1. The documents change

Policy documents, internal guides, operational procedures, and team references are not fixed assets. They evolve. A leave policy can be revised. A compliance checklist can be updated. A workflow can change after an audit or process redesign.

A retrieval-first system fits that reality better.

When a new file is uploaded, the platform can parse it, chunk it, embed it, and make it searchable. When an outdated file is removed, it can leave the corpus. The knowledge base changes through indexing, not through retraining cycles.

That makes the system more maintainable and more responsive to the real pace of organizational change.

A fine-tuned model can still be useful in such an environment, but using it as the main store of evolving policy knowledge adds more maintenance pressure than I wanted for this product.

2. Users need evidence, not just answers

Internal tools earn trust when people can inspect the source.

That requirement alone strongly favored retrieval.

EviVault was shaped around filenames, excerpts, chunk references, confidence indicators, and an evidence panel in the interface. Those features are much more natural when the system is literally retrieving the passages that informed the answer.

A retrieval-first architecture lets the product say, in effect:

Here is the answer, and here is where it came from.

That is very different from saying:

The model knows this because it was trained on something like it.

The first is easier to trust. The second is harder to validate.

This is especially important for internal questions involving operations, policy, process, or compliance. In those settings, the source is not optional context. The source is part of the product value.

3. Honest uncertainty matters

One of the strongest ideas in EviVault is that abstention is a valid answer.

If retrieval returns weak evidence, the system should pause. It should not turn a vague match into a confident paragraph.

That behavior becomes easier to implement when the architecture already revolves around retrieval quality. The system can inspect similarity scores, check whether anything relevant was actually found, and decide whether the answer is grounded enough to present.

A simplified version of that quality gate looks like this:

python
def generate_answer(question: str, chunks: list[dict]) -> dict:
    if not chunks:
        return {
            "answer": "I don't have enough evidence to answer this question.",
            "citations": [],
            "grounded": False,
            "confidence": "none",
        }

    top_similarity = chunks[0]["similarity"]

    if top_similarity < 0.35:
        return {
            "answer": "The available documents do not contain sufficient evidence to answer this reliably.",
            "citations": chunks[:3],
            "grounded": False,
            "confidence": "insufficient",
        }

    ...

That threshold logic is not just a technical detail. It reflects a product choice: weak evidence should not become polished certainty.

Retrieval made that choice easier to express and enforce.

The trust advantage of RAG

The strongest reason I chose RAG is not fashion. It is trust.

A lot of AI systems sound impressive long before they become dependable. They produce smooth language, broad coverage, and plausible explanations. The danger is that fluency can hide weak grounding.

For internal knowledge systems, that is not good enough.

A useful assistant should help people do at least three things:

get to the likely answer faster
inspect the evidence behind the answer
recognize when the system is not confident enough to answer well

RAG supports all three more naturally than fine-tuning as the primary architecture.

It improves access because retrieval surfaces relevant passages quickly.

It improves inspectability because the answer can travel with citations and excerpts.

It improves honesty because the system can measure retrieval quality before generation.

That combination was exactly what EviVault needed.

The operational advantage of RAG

There is also a practical engineering side to this decision.

I wanted EviVault to be usable in realistic environments, including ones that care about privacy, cost control, and local deployment options. That made local-first retrieval choices attractive.

The project uses a lightweight embedding model, all-MiniLM-L6-v2, and ChromaDB as a persistent vector store. That combination supports a compact retrieval stack without making external embedding APIs mandatory.

The retrieval flow is straightforward:

python
query_embedding = ef([query])[0]

results = collection.query(
    query_embeddings=[query_embedding],
    n_results=top_k,
    include=["documents", "metadatas", "distances"],
)

A small but important detail in the implementation is converting ChromaDB’s returned distance into a more intuitive similarity score:

python
similarity = 1.0 - distance

That score then feeds both backend quality logic and frontend trust signals.

This kind of implementation is easier to reason about when retrieval is the core of the system. The architecture stays modular. The ingestion pipeline, vector store, retrieval logic, answer generation, and user interface can evolve independently.

That modularity matters in production work. Systems become easier to test, easier to debug, and easier to improve over time.

Where fine-tuning still fits

Choosing RAG did not mean rejecting fine-tuning entirely.

It meant choosing the right primary mechanism for the product.

Fine-tuning still has a role in systems like this. It can help with:

consistent response style
structured answer formats
domain-specific phrasing
task-specific behaviors
classification or routing patterns

In other words, fine-tuning can be a strong secondary layer.

A mature version of a system like EviVault might absolutely use both:

retrieval for fresh, attributable knowledge
fine-tuning for behavior, consistency, or specialized output patterns

That hybrid future makes sense.

But for the first major architectural choice, retrieval was the clearer, safer, and more product-aligned foundation.

Why this mattered for the website story too

This decision also shaped how I tell the story of the project.

If I had built EviVault as a “fine-tuned internal chatbot,” the story would center more on model adaptation. But that would miss the more important product insight.

The deeper lesson from this project is not that a model can be customized. It is that internal AI systems become more useful when they are built around evidence, visibility, and disciplined uncertainty.

That is why the supporting parts of the platform matter so much:

chunking with overlap
semantic retrieval over indexed passages
confidence scoring
abstention below a threshold
evidence panels in the UI
per-user scoping and predictable boundaries

Those pieces make more sense together when retrieval is the backbone.

What I would recommend to other builders

If you are building an internal knowledge assistant, start by asking what problem you are really solving.

If the challenge is mainly tone, formatting, or model behavior, fine-tuning may deserve stronger consideration.

If the challenge is changing knowledge, source attribution, answer verification, and trust, retrieval will often be the better first architecture.

That does not mean RAG is effortless. Retrieval quality still depends on good ingestion, sensible chunking, reliable embeddings, careful filtering, and strong product decisions around evidence display and refusal behavior.

But those are exactly the kinds of design choices that make internal AI systems more dependable in practice.

Final Thoughts

I chose RAG over fine-tuning for EviVault because retrieval matched the problem better.

The product needed living knowledge, visible evidence, inspectable answers, and a clean way to say, “I do not have enough support for that.” Retrieval made those goals practical.

Fine-tuning remains useful. It may still have a place in the system as it evolves. But as the foundation for a trusted internal knowledge assistant, RAG was the right first choice.

For this project, the goal was never just to make the assistant answer.

It was to make the assistant answer with footing.