Back to articlesAI Infrastructure

Building Production-Grade RAG Pipelines

Designing a resilient retrieval-augmented generation platform for multi-tenant education workloads.

February 20, 202410 min
RAGAWSpgvectorMulti-tenant

Domain-Targeted Indexing

Curriculum content is uneven – some districts provide well-structured PDFs while others upload photos of whiteboards. We built a normalization pipeline using AWS Textract + Trp to convert everything into semantic chunks enriched with grade level, subject metadata, and reading difficulty. Chunks with low OCR confidence were routed through a manual validation queue so poor inputs never poisoned the embeddings.

Once normalized, documents flowed into a dual index in pgvector: a dense embedding index for semantic search and a lightweight inverted index for keyword fallback. The combination let us recover from noisy student queries without over-relying on hallucination-prone language models.

Tenant Isolation and Guardrails

Every school district maps to its own logical tenant with isolated S3 buckets, pgvector schemas, and IAM roles provisioned via Terraform. Requests carry a signed tenant token that our API gateway validates before routing traffic. Feature flags allow districts to opt into beta features without risking cross-tenant leakage.

We also instituted a “content safety” layer that runs retrieval results through a toxicity classifier and a policy rules engine. Anything flagged is replaced with a standard apology template so moderation failures never reach the classroom.

Runtime Observability

Bedrock/Claude inference is the most expensive portion of the pipeline, so we aggressively cached intermediate retrieval results with Redis using a semantic cache key. Downstream we tracked latency percentiles per tenant and per question type, exposing dashboards that made it obvious when a new set of materials caused embedding drift.

Finally we wired structured logging all the way from ingestion to inference. Each trace includes the document IDs used to build the final answer so teachers can audit where an explanation came from. This was the feature that unlocked district-wide approval.

Key takeaways

  • Hybrid semantic + keyword indexes kept recall above 90% without hallucinations.
  • Strict tenant-aware IAM policies prevented accidental data crossover during peak enrollment.
  • Full-funnel observability made it safe to ship weekly model and prompt improvements.

Need help implementing this?

I work with teams to turn these practices into production workflows.

Start a project conversation →