RAGs to Riches: Efficient WebChatbot with Async Scraping & Hybrid Re-Ranking PyCon India 2025

RAGs to Riches: Efficient WebChatbot with Async Scraping & Hybrid Re-Ranking
.ical

2025-09-14 16:20–16:50, Track 3

Building efficient RAG chatbots, especially with complex data, presents bottlenecks. This talk shares our use case demonstrating how 4 core innovations achieved significant performance gains:
1. Accelerated Ingestion: asyncio reduced website crawl time from 5h 18m to just 40m achieving a >9x speedup.
2. Complex Data Handling: Multi-Modal Chunking enables seamless ingestion of documents with intricate text, images, and tables.
3. Improved Retrieval: Hybrid Re-Ranking (considering website metrics) ensures more efficient and relevant information retrieval.
4. Optimized Responses: Classifying and rephrasing user queries generates concise, context-aware responses with minimal tokens/latency.
These integrated strategies resulted in a highly performant and valuable RAG chatbot.

Description:

Struggling to build RAG chatbots that are fast, accurate, and can handle complex documents containing images and tables? Traditional RAG pipelines often hit bottlenecks in data ingestion and retrieval efficiency, leading to slow updates, irrelevant responses, and high token costs.

This talk unveils a robust framework for building high-efficiency chatbots using Retrieval-Augmented Generation (RAG) with a hybrid re-ranking strategy, demonstrated through a real-world use case. We'll showcase innovations across the pipeline:

Accelerating Data Ingestion: See how our asyncio-powered scraper processes websites concurrently, dramatically reducing crawl time by over 9× (5h 18m → 40m).
Handling Complex Documents: Learn our Multi-Modal Chunking technique for seamlessly ingesting documents containing intricate text, images, and tables, preparing them effectively for RAG.
Optimizing Query Workflow: Understand how classifying and rephrasing user queries improves the initial retrieval stage for better relevance.
Implementing Hybrid Re-Ranking: Dive into our core Hybrid Re-Ranking mechanism that combines semantic relevance with structural metadata (like site depth and authority) to surface the most pertinent passages from potentially noisy initial search results.
Building the Full Pipeline: Get a clear, step-by-step blueprint for integrating these techniques to minimize token usage, accelerate response times, and consistently deliver precise, context-aware answers.

Attendees will leave with practical insights and a framework to significantly enhance the performance and capability of their own RAG implementations.

Session Outline:

Introduction & Motivation (1 min)
- Challenges with traditional chatbots: high token usage, latency, generic responses
- Objectives: boosting efficiency and accuracy with RAG + re-ranking
Standard RAG Pipeline Overview (3 min)
- Chunk → embed → vector store
- Document-type chunking (HTML vs. PDF) and multi-modal ingestion of images/tables
Web Scraping Abstraction (5 min)
- Async scheduler-driven graph traversal over a website’s HTML & PDF content. (9× faster compared to synchronous scraping: 5 h 18 m → 40 m)
- Parsing dynamic webpages and complex PDFs, containing images and tables, and extracting content.
- Key metadata captured (crawl depth, doc type, nav hierarchy)
Enhanced LLM Interaction Workflow (5 min)
- Classifying queries (gibberish, general chat, site-specific)
- Rewriting into semantically rich variants for broader recall
Hybrid Re-Ranking Strategy (5 min)
- Scoring snippets by semantic relevance, page-level weight, domain authority
- Applying weighted formula (α, β, γ) to surface the most pertinent chunks
Comparison: Traditional vs. Our RAG-Driven Approach (3 min)
- Traditional: raw LLM calls → token bloat, context loss
- Ours: RAG + hybrid re-ranking → targeted retrieval, cost-effective precision
Demonstration & Insights (3 min)
- End-to-end flow: user query → initial search → re-ranking → final LLM answer
- Tips on balancing accuracy, latency, and token budget
Conclusion & Q&A (5 min)
- Applying this pipeline in your own projects
- Open floor for questions and future directions

Prerequisites:

Attendees should have a basic understanding of:

Python 3.8+ and familiarity with asyncio for concurrent programming.
Large Language Models (LLMs) and text embeddings.
Experience with vector databases (e.g., Pinecone, FAISS) or general retrieval systems.
General familiarity with REST APIs and prompt engineering best practices.

Prerequisites –

Attendees should have a basic understanding of:
* Python 3.8+ and familiarity with asyncio for concurrent programming.
* Large Language Models (LLMs) and text embeddings.
* Experience with vector databases (e.g., Pinecone, FAISS) or general retrieval systems.
* General familiarity with REST APIs and prompt engineering best practices.

Target Audience –

Intermediate

See also: Talk Slides

Charan Teja C S

AI/ML Engineer, OSI Digital

Currently designing and building Python FastAPI backends and AI/ML-driven solutions, Charan is a language-independent developer who tackles problems with a fundamentals-first mindset.
A graduate of Osmania University (B.E. CSE), he’s active in competitive programming and, during an internship, developed a face-recognition module.
His academic capstone—a FastAPI-powered robotic greenhouse monitor using ML for leaf-disease detection—demonstrated his full-stack and AI expertise.
At OSI Digital, Charan has delivered 2 highly praised chatbots (HR policy and website inquiry), holds AWS certifications with Docker deployment experience, and continues to deepen his strengths in AI/ML and cloud computing.

Srikanth Doddi

Architect | Product Manager | Engineering Leader | AI

Srikanth is an Architect, Product Manager, and Engineering Lead at OSI Digital Pvt Ltd

He leads an exceptional team of 20 engineers, crafting innovative solutions that shape the future, one project at a time.

What I Do:

As an Architect and AI Innovator, I specialize in turning ideas into scalable, cutting-edge solutions across AWS, Azure, and GCP. From building microservices and cloud-native systems to powering real-time data streams with NodeJS, Python, and AWS IoT Core, I bring tech to life.

AI is my playground – whether it's AWS Rekognition, Google Vision AI, or GenAI, I’m constantly designing smarter, more intuitive technologies.

Leading a team of talented engineers, I’m dedicated to mentoring, empowering, and delivering high-impact results. Whether it’s creating sleek apps with React, Angular, and Ionic, or building interactive chatbots using DialogFlow and Watson, I’m always pushing the boundaries of what’s possible.

Let’s connect and build something extraordinary – srikanth.cloudarch@gmail.com
Explore more about our work at OSI Digital Pvt Ltd

RAGs to Riches: Efficient WebChatbot with Async Scraping & Hybrid Re-Ranking .ical 2025-09-14 16:20–16:50, Track 3

Description:

Session Outline:

Prerequisites:

Charan Teja C S

RAGs to Riches: Efficient WebChatbot with Async Scraping & Hybrid Re-Ranking
.ical

2025-09-14 16:20–16:50, Track 3