2025-09-14 –, Track 3
Building efficient RAG chatbots, especially with complex data, presents bottlenecks. This talk shares our use case demonstrating how 4 core innovations achieved significant performance gains:
1. Accelerated Ingestion: asyncio
reduced website crawl time from 5h 18m to just 40m achieving a >9x speedup.
2. Complex Data Handling: Multi-Modal Chunking enables seamless ingestion of documents with intricate text, images, and tables.
3. Improved Retrieval: Hybrid Re-Ranking (considering website metrics) ensures more efficient and relevant information retrieval.
4. Optimized Responses: Classifying and rephrasing user queries generates concise, context-aware responses with minimal tokens/latency.
These integrated strategies resulted in a highly performant and valuable RAG chatbot.
Description:
Struggling to build RAG chatbots that are fast, accurate, and can handle complex documents containing images and tables? Traditional RAG pipelines often hit bottlenecks in data ingestion and retrieval efficiency, leading to slow updates, irrelevant responses, and high token costs.
This talk unveils a robust framework for building high-efficiency chatbots using Retrieval-Augmented Generation (RAG) with a hybrid re-ranking strategy, demonstrated through a real-world use case. We'll showcase innovations across the pipeline:
- Accelerating Data Ingestion: See how our
asyncio
-powered scraper processes websites concurrently, dramatically reducing crawl time by over 9× (5h 18m → 40m). - Handling Complex Documents: Learn our Multi-Modal Chunking technique for seamlessly ingesting documents containing intricate text, images, and tables, preparing them effectively for RAG.
- Optimizing Query Workflow: Understand how classifying and rephrasing user queries improves the initial retrieval stage for better relevance.
- Implementing Hybrid Re-Ranking: Dive into our core Hybrid Re-Ranking mechanism that combines semantic relevance with structural metadata (like site depth and authority) to surface the most pertinent passages from potentially noisy initial search results.
- Building the Full Pipeline: Get a clear, step-by-step blueprint for integrating these techniques to minimize token usage, accelerate response times, and consistently deliver precise, context-aware answers.
Attendees will leave with practical insights and a framework to significantly enhance the performance and capability of their own RAG implementations.
Session Outline:
-
Introduction & Motivation (1 min)
- Challenges with traditional chatbots: high token usage, latency, generic responses
- Objectives: boosting efficiency and accuracy with RAG + re-ranking
-
Standard RAG Pipeline Overview (3 min)
- Chunk → embed → vector store
- Document-type chunking (HTML vs. PDF) and multi-modal ingestion of images/tables
-
Web Scraping Abstraction (5 min)
- Async scheduler-driven graph traversal over a website’s HTML & PDF content. (9× faster compared to synchronous scraping: 5 h 18 m → 40 m)
- Parsing dynamic webpages and complex PDFs, containing images and tables, and extracting content.
- Key metadata captured (crawl depth, doc type, nav hierarchy)
-
Enhanced LLM Interaction Workflow (5 min)
- Classifying queries (gibberish, general chat, site-specific)
- Rewriting into semantically rich variants for broader recall
-
Hybrid Re-Ranking Strategy (5 min)
- Scoring snippets by semantic relevance, page-level weight, domain authority
- Applying weighted formula (α, β, γ) to surface the most pertinent chunks
-
Comparison: Traditional vs. Our RAG-Driven Approach (3 min)
- Traditional: raw LLM calls → token bloat, context loss
- Ours: RAG + hybrid re-ranking → targeted retrieval, cost-effective precision
-
Demonstration & Insights (3 min)
- End-to-end flow: user query → initial search → re-ranking → final LLM answer
- Tips on balancing accuracy, latency, and token budget
-
Conclusion & Q&A (5 min)
- Applying this pipeline in your own projects
- Open floor for questions and future directions
Prerequisites:
Attendees should have a basic understanding of:
- Python 3.8+ and familiarity with
asyncio
for concurrent programming. - Large Language Models (LLMs) and text embeddings.
- Experience with vector databases (e.g., Pinecone, FAISS) or general retrieval systems.
- General familiarity with REST APIs and prompt engineering best practices.
Attendees should have a basic understanding of:
* Python 3.8+ and familiarity with asyncio
for concurrent programming.
* Large Language Models (LLMs) and text embeddings.
* Experience with vector databases (e.g., Pinecone, FAISS) or general retrieval systems.
* General familiarity with REST APIs and prompt engineering best practices.
Intermediate
Charan Teja C S
AI/ML Engineer, OSI Digital
Currently designing and building Python FastAPI backends and AI/ML-driven solutions, Charan is a language-independent developer who tackles problems with a fundamentals-first mindset.
A graduate of Osmania University (B.E. CSE), he’s active in competitive programming and, during an internship, developed a face-recognition module.
His academic capstone—a FastAPI-powered robotic greenhouse monitor using ML for leaf-disease detection—demonstrated his full-stack and AI expertise.
At OSI Digital, Charan has delivered 2 highly praised chatbots (HR policy and website inquiry), holds AWS certifications with Docker deployment experience, and continues to deepen his strengths in AI/ML and cloud computing.
Architect | Product Manager | Engineering Leader | AI
Srikanth is an Architect, Product Manager, and Engineering Lead at OSI Digital Pvt Ltd
He leads an exceptional team of 20 engineers, crafting innovative solutions that shape the future, one project at a time.
What I Do:
As an Architect and AI Innovator, I specialize in turning ideas into scalable, cutting-edge solutions across AWS, Azure, and GCP. From building microservices and cloud-native systems to powering real-time data streams with NodeJS, Python, and AWS IoT Core, I bring tech to life.
AI is my playground – whether it's AWS Rekognition, Google Vision AI, or GenAI, I’m constantly designing smarter, more intuitive technologies.
Leading a team of talented engineers, I’m dedicated to mentoring, empowering, and delivering high-impact results. Whether it’s creating sleek apps with React, Angular, and Ionic, or building interactive chatbots using DialogFlow and Watson, I’m always pushing the boundaries of what’s possible.
Let’s connect and build something extraordinary – srikanth.cloudarch@gmail.com
Explore more about our work at OSI Digital Pvt Ltd