Anindita Sinha Banerjee
With over a decade in Data and Decision Sciences, I design NLP and AI solutions that solve complex business challenges. Currently a Data Scientist at Red Hat and former researcher at Tata Research Development and Design Center, I have presented research at premier conferences and hold patents, advancing AI-driven innovations. Explore my Google Scholar page -https://scholar.google.com/citations?user=5GCQcVkAAAAJ&hl=en&oi=ao
https://www.linkedin.com/in/anindita-sinha-banerjee-41a99956/
Session
LLMs like GPT-4 can consume as much energy per query as an entire web search session. What if we could cut that with python powered vLLM? In this session, we'll explore how vLLM, a Python-powered, high-throughput inference engine, enables green AI deployment by drastically improving GPU efficiency. We'll cover techniques like PagedAttention, continuous batching, and speculative decoding, showing how they reduce latency, memory overhead, and energy usage per token. Additionally, we'll dive into the role of the LLM Compressor, a lightweight compression framework that shrinks model size while preserving accuracy—further slashing inference costs and power consumption. If you're interested in sustainable LLM deployment, GPU optimization, or how Python can lead the charge in green computing, this talk is for you.