Enterprise AI · Document Intelligence · RAG

EviVault Assistant: RAG-Powered Enterprise Document Intelligence Platform

A full-stack RAG platform that transforms organizational policy documents into an intelligent assistant, delivering citation-backed answers grounded in enterprise knowledge.

React 19FastAPISQLAlchemy 2.0ChromaDBSentenceTransformersJWT AuthRAG Pipeline
View on GitHub
EviVault Assistant: RAG-Powered Enterprise Document Intelligence Platform dashboard

EviVault Assistant

A full-stack document intelligence platform that turns internal policy documents into a citation-backed assistant for trusted enterprise question answering.

Overview

Project type: Enterprise document intelligence / retrieval-augmented generation
Role: Product architecture, ingestion workflow design, RAG pipeline implementation, backend API design, frontend delivery
Core value: Replaces fragmented document search with grounded answers backed by verifiable evidence
Stack: React 19, FastAPI, ChromaDB, SentenceTransformers, JWT auth, RAG pipeline

At a glance

  • Ingests PDF, DOCX, TXT, and Markdown documents
  • Uses semantic retrieval over chunked internal content
  • Returns citation-backed answers with evidence metadata
  • Abstains when retrieval quality is too weak
  • Supports local embeddings and no-LLM fallback for constrained environments
  • Designed for trust, auditability, and upgradeable enterprise deployment

The Problem

Organizations accumulate large volumes of internal knowledge: onboarding guides, infrastructure policies, compliance rules, safety documents, and operational procedures.

The problem is rarely knowledge creation. It is knowledge access.

Common failure modes include:

  • Search fatigue: employees search shared drives, skim long PDFs, and still leave unsure whether they found the right answer
  • Tribal knowledge dependency: critical answers live with a small number of experienced people, creating delays and operational risk
  • Hallucination risk: off-the-shelf LLM tools can produce plausible answers that are not grounded in actual company policy
  • No audit trail: traditional internal Q&A workflows rarely show which exact document passage supports the answer

For policy, safety, and compliance-heavy environments, “probably correct” is not good enough. Teams need answers they can verify.

The Approach

EviVault Assistant is a retrieval-augmented document intelligence platform built around a strict principle:

Every answer should be backed by evidence. If the evidence is weak, the system should abstain.

The workflow follows four stages:

  1. Upload- users add internal documents in PDF, DOCX, TXT, or Markdown format
  2. Ingest- the pipeline extracts text, chunks content into overlapping segments, generates embeddings, and stores them in a vector database
  3. Retrieve- user questions trigger semantic search over the indexed chunks
  4. Answer- the system returns a grounded answer with citations, confidence metadata, and source excerpts

Why this design

The platform was designed around three practical goals:

  • Grounding: answers must come from retrieved internal evidence, not model guesswork
  • Auditability: users should be able to inspect the source passage behind each response
  • Graceful failure: when retrieval is weak, the system should refuse rather than hallucinate

Architecture

EviVault Assistant Architecture Diagram

Architecture highlights

  • Clear separation between ingestion, retrieval, generation, and persistence layers
  • Grounding metadata carried from retrieval into the answering interface
  • Optional generation layer with extractive fallback for environments without external LLM access
  • Vector store abstraction designed for future backend swaps without breaking the application contract

Key Technical Decisions

1. RAG over fine-tuning

RAG was chosen over fine-tuning because document-backed enterprise Q&A needs traceability.

Fine-tuning stores knowledge in model weights, making updates slower and answer provenance harder to inspect. RAG keeps knowledge in the document layer, allowing answers to reference specific retrieved chunks. For internal policy and compliance use cases, that traceability is more important than stylistic fluency.

2. Similarity threshold for abstention

Large language models will generate an answer even when retrieval quality is poor.

To prevent confident but ungrounded responses, the system applies a retrieval quality gate. If no retrieved chunk exceeds a cosine similarity threshold of 0.35, the platform abstains rather than answering. This is a deliberate product decision: in high-trust settings, no answer is safer than a fabricated one.

3. Local embeddings with all-MiniLM-L6-v2

The embedding model was selected as a pragmatic local-first option.

all-MiniLM-L6-v2 provides strong semantic retrieval quality with lightweight CPU-friendly inference and no mandatory API dependency. This makes the platform more suitable for internal deployments where document content should remain within the organization boundary.

4. ChromaDB for embedded vector search

ChromaDB was chosen because it keeps the vector layer simple and self-contained.

It runs locally, persists to disk, and avoids additional infrastructure for early-stage internal deployments. The platform also uses a vector store abstraction, so migration to a managed service later can be done with limited code changes.

5. Extractive fallback when no LLM is configured

Not every deployment can depend on an external generation API.

When no OpenAI key is configured, the platform falls back to extractive summarization by returning the most relevant chunk instead of generating a synthesized response. This allows the system to degrade gracefully, support local development, and remain useful in restricted environments.

6. JWT for stateless authentication

JWT was selected to support stateless authentication and future multi-client access patterns such as APIs, CLI tools, or additional frontends.

The trade-off is more complex token revocation, but for the current architecture that trade-off is acceptable.

Features

RAG Pipeline

The core pipeline is organized into four stages:

StageImplementationDetail
Extractpypdf, python-docx, plain textSupports PDF, DOCX, TXT, MD
Chunk512-char windows, 64-char overlapOverlap improves retrieval continuity
EmbedSentenceTransformers (all-MiniLM-L6-v2)384-dim vectors with local inference
RetrieveChromaDB cosine similarity, top-5Scoped by user access and document status

Confidence & Grounding

Every answer includes structured evidence signals:

SignalMeaning
Confidence: HighTop chunk similarity > 0.6
Confidence: MediumTop chunk similarity > 0.4
Confidence: LowTop chunk similarity > 0.35
AbstentionNo chunk exceeds 0.35, so the system refuses
Grounded: true/falseWhether the response is supported by retrieved text

Evidence Panel

Each answer surfaces:

  • filename
  • chunk index
  • similarity score
  • retrieved excerpt

This creates an auditable chain from question to evidence, helping users verify not only the answer, but the reasoning basis behind it.

Document Management

  • Upload PDF, DOCX, TXT, and MD documents
  • Automated extract → chunk → embed → store pipeline
  • Processing status tracking: processing, ready, failed
  • Per-user document scoping
  • Visible chunk counts and metadata per document

Role-Based Access Control

  • Admin: full access, including user management
  • Researcher: upload documents, ask questions, and view analytics
  • User: ask questions against available documents

Admin Dashboard

  • User CRUD management
  • Usage analytics and query categories
  • Evaluation metrics for monitoring retrieval and answering behavior

Tech Stack

LayerTechnology
FrontendReact 19, React Router v6, Vite
BackendFastAPI, SQLAlchemy 2.0, Pydantic v2
Vector DBChromaDB 0.5.11
EmbeddingsSentenceTransformers (all-MiniLM-L6-v2)
LLMOpenAI API (optional, with extractive fallback)
AuthJWT (python-jose + passlib/bcrypt)
DatabaseSQLite (dev) / PostgreSQL 16 (prod)
Document Parsingpypdf 5.0.1, python-docx 1.1.2
Testingpytest + httpx
ContainerizationDocker + Docker Compose

Data Model

The data model supports the full lifecycle from upload to grounded response:

EntityKey FieldsRelationship
UserUUID, email, role, hashed passwordHas many Documents, has many QueryLogs
DocumentFilename, MIME type, chunk count, processing statusBelongs to User, has many Chunks
DocumentChunkContent slice, offsets, vector reference IDBelongs to Document
QueryLogQuestion, answer, grounded flag, confidence, retrieved chunk countBelongs to User

Dataset

The seed script models an AI research lab with 5 internal policy documents:

  • AI Research Lab Handbook
  • AI Safety Guidelines
  • GPU Cluster Usage Policy
  • Model Release Checklist
  • Onboarding Guide

Three seeded users; one admin and two researchers, allow immediate testing of upload, retrieval, and grounded Q&A flows.

Results & Takeaways

Key outcomes

  • Built a citation-first enterprise Q&A workflow grounded in internal documents
  • Reduced dependence on keyword search and undocumented tribal knowledge
  • Added abstention logic to reduce ungrounded answering risk
  • Created an architecture that supports secure internal deployment and future retrieval upgrades

Technical takeaways

  • Grounded answers increase trust because users can inspect the exact evidence passage
  • Similarity-gated abstention is a product feature, not a system weakness
  • Local embeddings make the platform more viable for private and air-gapped environments
  • The architecture supports clear future upgrades such as reranking, async ingestion, managed vector databases, and query history visualization

Try It

The source is on GitHub.

# Backend
cd backend && python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt && python seed.py
uvicorn app.main:app --reload --port 8002

# Frontend
cd frontend && npm install && npm run dev

Open http://localhost:5173 researcher@ola.ai / admin123