BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//cfp.in.pycon.org//2025//SE8K9J
BEGIN:VTIMEZONE
TZID:IST
BEGIN:STANDARD
DTSTART:20000101T000000
RRULE:FREQ=YEARLY;BYMONTH=1
TZNAME:IST
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-2025-SDPA9X@cfp.in.pycon.org
DTSTART;TZID=IST:20250914T145000
DTEND;TZID=IST:20250914T152000
DESCRIPTION:LLMs like GPT-4 can consume as much energy per query as an enti
 re web search session. What if we could cut that with python powered vLLM?
  In this session\, we'll explore how vLLM\, a Python-powered\, high-throug
 hput inference engine\, enables green AI deployment by drastically improvi
 ng GPU efficiency. We'll cover techniques like PagedAttention\, continuous
  batching\, and speculative decoding\, showing how they reduce latency\, m
 emory overhead\, and energy usage per token. Additionally\, we'll dive int
 o the role of the LLM Compressor\, a lightweight compression framework tha
 t shrinks model size while preserving accuracy—further slashing inferenc
 e costs and power consumption. If you're interested in sustainable LLM dep
 loyment\, GPU optimization\, or how Python can lead the charge in green co
 mputing\, this talk is for you.
DTSTAMP:20250720T211258Z
LOCATION:Track 3
SUMMARY:Green AI at Scale: Energy-Efficient LLM Serving using vLLM & LLM Co
 mpressor - Anindita Sinha Banerjee\, Abhijit Roy
URL:https://cfp.in.pycon.org/2025/talk/SDPA9X/
END:VEVENT
END:VCALENDAR