PyCon India 2025

Evals First, Code Later: A Practical Guide to Evaluations, Rerankers & Caches
2025-09-14 , Track 3

Many retrieval-augmented generation (RAG) and code-search pipelines rely on ad-hoc checks and break when deployed at scale.

This talk presents an evaluation-first development workflow applied to a production code-search engine built with Python, PostgreSQL (pgvector), and OpenAI rerankers. Introducing automated evaluation suites before optimisation cut average query latency from 20 min to 30 s, delivered a 40 × speed-up, and raised relevance by ≈ 30 %. We will cover:

  • Constructing task-specific evaluation datasets and metrics
  • Hybrid (lexical + ANN) retrieval
  • Cross-encoder reranking for precision boosts
  • Semantic caching strategies that keep indexes fresh and queries fast

The session includes benchmark results, a live demonstration, and an MIT-licensed reference implementation that attendees can clone and extend.


Time Section
0–3 min Introduction
3–6 min Why Evals First: avoiding invisible failures
6–9 min Chunking strategies: myths, length / overlap, heuristics, & findings
9–12 min Hybrid retrieval: BM25 + ANN + score fusion
12–16 min Rerankers: what works, when, and why
16–19 min Semantic caching: avoid redundant computation, stay fresh
19–23 min Live demo: see relevance jump.
23–25 min Cheatsheet + Repo walkthrough
25–30 min Q\&A

Target Audience

Beginner

Prerequisites
  • Basic proficiency in Python
  • Prior use of a database (exposure to vector DB is helpful but not required)