Overview
A leading Fortune 500 insurance company was drowning in documents. Policy manuals, claims reports, regulatory filings, underwriting guidelines — terabytes of institutional knowledge locked in PDFs and Word files, inaccessible to the 2,000+ employees who needed it daily.
We built them a multi-tenant RAG platform that now processes over 2 million pages monthly, enabling instant, accurate answers from their entire document corpus.
The Challenge
- 4TB+ of legacy documents spanning 20+ years
- 12 distinct business units with different access controls
- Strict regulatory requirements around data residency
- Sub-3-second response time requirement
Our Approach
Document Ingestion Pipeline
We built an automated ingestion system using Apache Airflow that:
- Monitors designated SharePoint and S3 locations
- Classifies documents by type (policy, claims, regulatory) using a fine-tuned classifier
- Extracts text with layout-aware parsing (preserving tables, headers)
- Applies semantic chunking with parent-child relationships
Embedding and Indexing
We chose a hybrid approach:
- Dense embeddings:
text-embedding-3-largefor semantic search - Sparse retrieval: BM25 for keyword matching
- Vector store: Pinecone with namespace-based multi-tenancy
Retrieval and Generation
The query pipeline applies:
- Query expansion (3 rewritten variants per query)
- Hybrid search across dense + sparse indices
- Cohere reranking (top-50 → top-8)
- Claude 3.5 Sonnet for generation with source citation
Results
- 87% reduction in time spent searching for document answers
- 2M+ pages processed monthly with 99.2% uptime
- 4.2/5 user satisfaction score across 500+ daily active users
- < 2.8s average response time in P95
Tech Stack
- Orchestration: Apache Airflow
- Backend: FastAPI + Python
- Vector DB: Pinecone
- LLM: Claude 3.5 Sonnet (Anthropic API)
- Reranking: Cohere Rerank
- Frontend: Next.js
- Infrastructure: AWS (ECS, RDS, S3)