2025-09-14 –, Track 3
Many retrieval-augmented generation (RAG) and code-search pipelines rely on ad-hoc checks and break when deployed at scale.
This talk presents an evaluation-first development workflow applied to a production code-search engine built with Python, PostgreSQL (pgvector), and OpenAI rerankers. Introducing automated evaluation suites before optimisation cut average query latency from 20 min to 30 s, delivered a 40 × speed-up, and raised relevance by ≈ 30 %. We will cover:
- Constructing task-specific evaluation datasets and metrics
- Hybrid (lexical + ANN) retrieval
- Cross-encoder reranking for precision boosts
- Semantic caching strategies that keep indexes fresh and queries fast
The session includes benchmark results, a live demonstration, and an MIT-licensed reference implementation that attendees can clone and extend.
Time | Section |
---|---|
0–3 min | Introduction |
3–6 min | Why Evals First: avoiding invisible failures |
6–9 min | Chunking strategies: myths, length / overlap, heuristics, & findings |
9–12 min | Hybrid retrieval: BM25 + ANN + score fusion |
12–16 min | Rerankers: what works, when, and why |
16–19 min | Semantic caching: avoid redundant computation, stay fresh |
19–23 min | Live demo: see relevance jump. |
23–25 min | Cheatsheet + Repo walkthrough |
25–30 min | Q\&A |
Beginner
Prerequisites –- Basic proficiency in Python
- Prior use of a database (exposure to vector DB is helpful but not required)
Saksham Aggarwal is a founder and engineer building AI agents that automate engineering grunt work, starting with SDK integrations so that product teams can move faster.
He was the #1 engineer / founding member at PYOR, a Castle Island–backed financial data startup, where he built an enterprise data terminal for on-chain analytics.
Saksham has also driven growth at Flint (now LogX) and scaled Conquest, India’s largest student-run startup accelerator.
He’s passionate about: Agentic retrieval systems, Programmable prompts, Evals (Text/Image/Video), Interaction design, and Synthetic data pipelines.