Enterprise RAG Platform

Overview

A leading Fortune 500 insurance company was drowning in documents. Policy manuals, claims reports, regulatory filings, underwriting guidelines — terabytes of institutional knowledge locked in PDFs and Word files, inaccessible to the 2,000+ employees who needed it daily.

We built them a multi-tenant RAG platform that now processes over 2 million pages monthly, enabling instant, accurate answers from their entire document corpus.

The Challenge

4TB+ of legacy documents spanning 20+ years
12 distinct business units with different access controls
Strict regulatory requirements around data residency
Sub-3-second response time requirement

Our Approach

Document Ingestion Pipeline

We built an automated ingestion system using Apache Airflow that:

Monitors designated SharePoint and S3 locations
Classifies documents by type (policy, claims, regulatory) using a fine-tuned classifier
Extracts text with layout-aware parsing (preserving tables, headers)
Applies semantic chunking with parent-child relationships

Embedding and Indexing

We chose a hybrid approach:

Dense embeddings: text-embedding-3-large for semantic search
Sparse retrieval: BM25 for keyword matching
Vector store: Pinecone with namespace-based multi-tenancy

Retrieval and Generation

The query pipeline applies:

Query expansion (3 rewritten variants per query)
Hybrid search across dense + sparse indices
Cohere reranking (top-50 → top-8)
Claude 3.5 Sonnet for generation with source citation

Results

87% reduction in time spent searching for document answers
2M+ pages processed monthly with 99.2% uptime
4.2/5 user satisfaction score across 500+ daily active users
< 2.8s average response time in P95

Tech Stack

Orchestration: Apache Airflow
Backend: FastAPI + Python
Vector DB: Pinecone
LLM: Claude 3.5 Sonnet (Anthropic API)
Reranking: Cohere Rerank
Frontend: Next.js
Infrastructure: AWS (ECS, RDS, S3)