Organizations rarely struggle to create knowledge. They struggle to find it, trust it, and use it quickly.
That problem shows up in familiar ways. Someone searches a shared drive, opens three policy documents, skims thirty pages, then messages a colleague anyway. A manager asks an internal assistant a simple operational question and gets a polished answer that sounds right, but nobody can tell which document supported it. Confidence rises. Trust drops.
EviVault Assistant grew from that tension.
I did not want to build another “chat with your files” demo. I wanted to build a document intelligence system that treated retrieval as the center of the product, evidence as part of the user experience, and refusal as a valid outcome when the system lacked strong support.
What EviVault Assistant is
EviVault Assistant is a full-stack document intelligence platform for internal knowledge search and question answering. Users upload policy documents, onboarding guides, operational procedures, and related files. The system extracts text, chunks it, embeds it, stores it in a vector database, retrieves relevant passages for a user query, and returns an answer grounded in those passages.
That sounds straightforward. The real product challenge lives in the details:
- How do you make retrieval strong enough to support useful answers?
- How do you show evidence in a way people will actually inspect?
- How do you handle low-confidence cases without pretending certainty?
- How do you keep the system practical for teams that care about privacy, maintainability, and control?
Those questions shaped nearly every design decision in the system.
The design principles
Three principles drove the platform.
Grounding first
The answer should come from retrieved evidence, not model improvisation. Language models are excellent at generating fluent text. Internal knowledge systems need more than fluency. They need answers tied to documents people recognize and can verify.
Auditability as a product feature
People trust internal tools when they can inspect the source. That means filenames, excerpts, chunk references, and visible confidence signals should be product features, not backend leftovers.
Graceful failure
When retrieval quality is weak, the system should step back. A misleading answer about policy, compliance, or process creates real downstream cost. Weak evidence should not become a polished claim.
Why I chose RAG over fine-tuning
A common instinct with internal knowledge systems is to ask whether the model should be fine-tuned on company data. In many document workflows, retrieval is the better first move.
Fine-tuning pushes knowledge into model weights. Retrieval keeps knowledge in the document layer. That difference matters. When documents change, retrieval lets you re-index rather than retrain. When users want proof, retrieval gives you a path back to the source file and chunk. When teams care about privacy and local control, retrieval can fit more naturally into that architecture.
For EviVault, retrieval matched the problem. The target documents were policy files, operational references, and internal guides. Those sources change. People need to verify them. RAG fit the use case better than trying to bury knowledge inside model parameters.
At a high level, the system flow is simple:
textUpload → Ingest → Retrieve → Answer
That four-stage shape kept the architecture understandable and made each layer easier to evolve independently.
The architecture
Under the hood, EviVault is split into a small set of clear layers:
- React frontend
- FastAPI backend
- ChromaDB vector store
- SentenceTransformers embeddings
- Relational persistence for users, documents, chunk metadata, and logs
- Optional LLM-backed synthesis with extractive fallback
That structure matters. It keeps the product legible. It also lets retrieval metadata flow cleanly from backend to interface, which is important when trust features are part of the user experience rather than hidden implementation details.
A thin FastAPI route coordinates the core Q&A pipeline:
python@router.post("/ask", response_model=AskResponse) def ask_question( payload: AskRequest, db: Session = Depends(get_db), user: User = Depends(get_current_user), ): chunks = retrieve_chunks( query=payload.question, top_k=5, user_id=user.id, db=db, ) result = generate_answer(payload.question, chunks) # Log the query log = QueryLog( user_id=user.id, question=payload.question, answer=result["answer"][:2000], grounded=result["grounded"], confidence=result["confidence"], retrieved_chunks=len(chunks), ) db.add(log) db.commit() return AskResponse(**result)
That route reflects one of the strongest engineering choices in the project: the API layer stays deliberately thin, while retrieval, generation, and persistence stay in their own services and models.
Ingestion and retrieval choices
The ingestion pipeline matters more than many teams expect. A retrieval system is only as useful as the text it indexes.
EviVault supports practical internal document formats first: PDF, DOCX, TXT, and Markdown. That keeps the ingestion layer focused while still covering a large share of real operational knowledge.
The chunking logic uses overlap so answers that sit near boundaries do not get split into unusable fragments:
pythondef chunk_text(text: str, chunk_size: int = None, overlap: int = None) -> list[dict]: chunk_size = chunk_size or settings.CHUNK_SIZE overlap = overlap or settings.CHUNK_OVERLAP chunks = [] start = 0 while start < len(text): end = min(start + chunk_size, len(text)) # Try to break at sentence boundary if end < len(text): for sep in ["\n\n", "\n", ". ", "! ", "? "]: last = text[start:end].rfind(sep) if last > chunk_size // 2: end = start + last + len(sep) break chunk_content = text[start:end].strip() if chunk_content: chunks.append({ "content": chunk_content, "char_start": start, "char_end": end, }) start = end - overlap if end < len(text) else end return chunks
The chunker tries to break at sentence or paragraph boundaries before falling back to character splits — the core idea stays the same: preserve local context and keep continuity across neighboring chunks.
For embeddings, EviVault uses all-MiniLM-L6-v2 as a pragmatic local-first choice. It is lightweight, CPU-friendly, and suitable for internal semantic retrieval without introducing unnecessary infrastructure overhead. ChromaDB handles persistent vector storage and cosine-based nearest-neighbor search.
A small detail in the retrieval layer matters more than it looks:
pythonsimilarity = 1.0 - distance
ChromaDB returns cosine distance, not similarity, so the code converts it into a score that is easier for both product logic and UI display.
This retrieval approach matters because internal questions are often phrased differently from the source text. Someone may ask, “How many vacation days do I get?” while the policy says “annual paid time off entitlement.” Dense retrieval closes that gap better than plain keyword search.
Confidence, abstention, and product discipline
This is the heart of the system.
Many assistants are evaluated by how often they answer. EviVault is built more around when it should answer well and when it should hold back. The generation service applies a quality gate before it attempts synthesis. If no chunk is retrieved, or if the best similarity score is too weak, the system abstains.
pythondef generate_answer(question: str, chunks: list[dict]) -> dict: if not chunks: return { "answer": "I don't have enough evidence in the uploaded documents to answer this question. Please try rephrasing or upload relevant documents.", "citations": [], "grounded": False, "confidence": "none", } top_similarity = chunks[0]["similarity"] if chunks else 0 if top_similarity < 0.35: return { "answer": "The available documents do not contain sufficient evidence to answer this question reliably. The closest matches were not relevant enough to provide a grounded response.", "citations": [{"filename": c["filename"], "chunk_index": c["chunk_index"], "similarity": c["similarity"]} for c in chunks[:3]], "grounded": False, "confidence": "insufficient", } context_block = build_context_block(chunks) system_prompt = load_system_prompt() # ... LLM call or extractive fallback ... confidence = "high" if top_similarity > 0.6 else "medium" if top_similarity > 0.4 else "low" citations = [] seen = set() for c in chunks: key = (c["filename"], c["chunk_index"]) if key not in seen: citations.append({ "filename": c["filename"], "chunk_index": c["chunk_index"], "similarity": c["similarity"], "excerpt": c["content"][:200] + "..." if len(c["content"]) > 200 else c["content"], }) seen.add(key) return { "answer": answer_text, "citations": citations, "grounded": True, "confidence": confidence, }
That threshold-based gate is one of the strongest product decisions in the project. It protects user trust, reduces fabricated policy answers, and creates a cleaner path for follow-up. In EviVault, abstention is not framed as failure. It is framed as disciplined behavior.
The system also includes an extractive fallback when no external generation API is configured. That means the product can still return useful, evidence-backed excerpts in more constrained or local-first environments.
Making trust visible in the interface
Grounding is not only a backend property. It has to be visible.
That is why the frontend surfaces an evidence panel, confidence badges, and grounded-versus-ungrounded signals directly in the chat experience. The UI does not hide the evidence behind extra clicks. It keeps the latest answer’s supporting citations in view.
The API returns an answer together with citations and trust metadata:
pythonclass Citation(BaseModel): filename: str chunk_index: int similarity: float excerpt: Optional[str] = None class AskResponse(BaseModel): answer: str citations: list[Citation] grounded: bool confidence: str
That structure makes it easy for the frontend to render not just content, but evidence and confidence as first-class parts of the experience.
On the React side, the message object carries those signals forward:
jsxasync function sendQuestion(question) { setInput(""); setMessages((prev) => [...prev, { role: "user", content: question }]); setLoading(true); setCitations([]); try { const data = await askQuestion(question); setMessages((prev) => [ ...prev, { role: "assistant", content: data.answer, confidence: data.confidence, grounded: data.grounded, }, ]); setCitations(data.citations || []); } catch (err) { setMessages((prev) => [ ...prev, { role: "assistant", content: `Error: ${err.message}`, isError: true }, ]); } finally { setLoading(false); } }
That pattern helps bridge the gap between backend retrieval logic and the human decision about whether to trust the answer.
Security, scoping, and predictable boundaries
A good AI product still has to be a good product.
EviVault uses JWT-based authentication, role-aware access, and per-user document scoping. The vector store handles similarity search, while ownership enforcement stays in the application layer where it belongs. That separation keeps authorization logic explicit and maintainable.
A representative ownership check looks like this:
pythondoc = db.query(Document).filter( Document.id == doc_id, Document.owner_id == user.id, ).first() if not doc: raise HTTPException(status_code=404, detail="Document not found")
Returning 404 rather than 403 for unauthorized document access is a thoughtful touch. It avoids leaking information about resources the user should not even know exist.
These details may feel ordinary next to embeddings and retrieval, but they are central to real product trust. Internal AI tools succeed when they respect clear auth boundaries, understandable failures, and predictable workflows.
What this project taught me
EviVault reinforced a few ideas that now shape how I think about AI applications more broadly.
First, trust starts before generation. Prompting and model choice matter, but retrieval quality, evidence presentation, and refusal behavior matter more in real internal use.
Second, abstention improves product integrity. A system that occasionally says it does not have strong enough support for an answer often earns more trust than one that responds to everything.
Third, grounding should be visible. Source metadata and excerpts belong in the product, not buried in backend traces.
Fourth, practical deployment constraints shape architecture. Local embeddings, extractive fallback, and modular retrieval were not shortcuts. They were deliberate product choices aimed at building something usable in realistic environments.
Final Thoughts
EviVault Assistant was my way of exploring a simple belief: internal AI systems become more valuable when they are evidence-aware, operationally realistic, and honest about uncertainty.
That belief led to a platform centered on grounded retrieval, citation-backed answers, visible confidence signals, and abstention when evidence was weak. The result is not just a RAG demo. It is a document intelligence system shaped around trust.
If you are building an internal assistant for policy, operations, or enterprise knowledge, that trust layer deserves as much attention as the model layer. In many cases, it deserves more.
Deep Dives
Each layer of EviVault is explored in detail in the articles below, ordered from foundation to the deep dives.
- Why I Chose RAG Over Fine-Tuning: the architectural decision that shaped everything else.
- The Ingestion Pipeline: From Raw File to Searchable Chunks: Text extraction, chunking, and boundary alignment.
- Vector Store and Semantic Retrieval: Embeddings, ChromaDB, and closing the vocabulary gap.
- Assembling the RAG Pipeline: The /ask Endpoint: Wiring retrieval, generation, and logging together
- Building Trust Into the UI: The Evidence Panel: Making grounding visible as a product feature.
- Securing EviVault: Authentication, Access Control, and Predictable Boundaries: JWT auth, per-user scoping, and safe failure modes.

