All projects
RAGEnterpriseDocument AIlive

Enterprise RAG Platform

87% reduction in manual document review time

2M+ pages processed monthly

99.2% uptime over 12 months

Overview

A leading Fortune 500 insurance company was drowning in documents. Policy manuals, claims reports, regulatory filings, underwriting guidelines — terabytes of institutional knowledge locked in PDFs and Word files, inaccessible to the 2,000+ employees who needed it daily.

We built them a multi-tenant RAG platform that now processes over 2 million pages monthly, enabling instant, accurate answers from their entire document corpus.

The Challenge

  • 4TB+ of legacy documents spanning 20+ years
  • 12 distinct business units with different access controls
  • Strict regulatory requirements around data residency
  • Sub-3-second response time requirement

Our Approach

Document Ingestion Pipeline

We built an automated ingestion system using Apache Airflow that:

  1. Monitors designated SharePoint and S3 locations
  2. Classifies documents by type (policy, claims, regulatory) using a fine-tuned classifier
  3. Extracts text with layout-aware parsing (preserving tables, headers)
  4. Applies semantic chunking with parent-child relationships

Embedding and Indexing

We chose a hybrid approach:

  • Dense embeddings: text-embedding-3-large for semantic search
  • Sparse retrieval: BM25 for keyword matching
  • Vector store: Pinecone with namespace-based multi-tenancy

Retrieval and Generation

The query pipeline applies:

  1. Query expansion (3 rewritten variants per query)
  2. Hybrid search across dense + sparse indices
  3. Cohere reranking (top-50 → top-8)
  4. Claude 3.5 Sonnet for generation with source citation

Results

  • 87% reduction in time spent searching for document answers
  • 2M+ pages processed monthly with 99.2% uptime
  • 4.2/5 user satisfaction score across 500+ daily active users
  • < 2.8s average response time in P95

Tech Stack

  • Orchestration: Apache Airflow
  • Backend: FastAPI + Python
  • Vector DB: Pinecone
  • LLM: Claude 3.5 Sonnet (Anthropic API)
  • Reranking: Cohere Rerank
  • Frontend: Next.js
  • Infrastructure: AWS (ECS, RDS, S3)