PyCon India 2025

Abhijit Roy

I’m a Software Maintenance Engineer with a focus on quantization and fine-tuning large language models (LLMs). I work with tools and frameworks like vLLM, LLM Compressor, InstructLab, and RHEL AI to optimize and maintain high-performance AI systems.
My work revolves around making LLMs more efficient, scalable, and adaptable for real-world use cases—whether it’s reducing inference costs, enhancing model alignment, or supporting enterprise-grade AI deployments.


Professional Link

https://www.linkedin.com/in/abhijit-roy-658571146/


Session

09-14
14:50
30min
Green AI at Scale: Energy-Efficient LLM Serving using vLLM & LLM Compressor
Anindita Sinha Banerjee, Abhijit Roy

LLMs like GPT-4 can consume as much energy per query as an entire web search session. What if we could cut that with python powered vLLM? In this session, we'll explore how vLLM, a Python-powered, high-throughput inference engine, enables green AI deployment by drastically improving GPU efficiency. We'll cover techniques like PagedAttention, continuous batching, and speculative decoding, showing how they reduce latency, memory overhead, and energy usage per token. Additionally, we'll dive into the role of the LLM Compressor, a lightweight compression framework that shrinks model size while preserving accuracy—further slashing inference costs and power consumption. If you're interested in sustainable LLM deployment, GPU optimization, or how Python can lead the charge in green computing, this talk is for you.

Others
Track 3