2025-09-14 –, Track 3
Many retrieval-augmented generation (RAG) and code-search pipelines rely on ad-hoc checks and break when deployed at scale.
This talk presents an evaluation-first development workflow applied to a production code-search engine built with Python, PostgreSQL (pgvector), and OpenAI rerankers. Introducing automated evaluation suites before optimisation cut average query latency from 20 min to 30 s, delivered a 40 × speed-up, and raised relevance by ≈ 30 %. We will cover:
- Constructing task-specific evaluation datasets and metrics
- Hybrid (lexical + ANN) retrieval
- Cross-encoder reranking for precision boosts
- Semantic caching strategies that keep indexes fresh and queries fast
The session includes benchmark results, a live demonstration, and an MIT-licensed reference implementation that attendees can clone and extend.
Time | Section |
---|---|
0–3 min | Introduction |
3–6 min | Why Evals First: avoiding invisible failures |
6–9 min | Chunking strategies: myths, length / overlap, heuristics, & findings |
9–12 min | Hybrid retrieval: BM25 + ANN + score fusion |
12–16 min | Rerankers: what works, when, and why |
16–19 min | Semantic caching: avoid redundant computation, stay fresh |
19–23 min | Live demo: see relevance jump. |
23–25 min | Cheatsheet + Repo walkthrough |
25–30 min | Q\&A |
Beginner
Prerequisites –- Basic proficiency in Python
- Prior use of a database (exposure to vector DB is helpful but not required)