Compute-Scaling Methods at Inference Time PyCon India 2025

Compute-Scaling Methods at Inference Time
.ical

2025-09-14 10:10–10:40, Track 1

Large language models (LLMs) don’t always need to be bigger to be better—sometimes, they just need to think more efficiently. Inference-time compute scaling improves LLM reasoning by dynamically increasing computational effort during inference, similar to how humans perform better when given more time to think. This session will explore cutting-edge techniques like Chain-of-Thought (CoT) prompting, voting-based search, and the latest research in test-time scaling. We’ll dive into methods like “Wait” tokens, preference optimization, and dynamic depth scaling, which allow models to refine responses on the fly. Whether you're interested in boosting LLM accuracy, improving robustness, or optimizing compute budgets, this talk will provide key insights into the future of smarter, more adaptable AI.

Session Breakdown

Introduction: Why Scaling Compute at Inference Matters
The challenge: LLMs often struggle with complex reasoning tasks.
The traditional approach: Scaling model size vs. scaling inference-time compute.
The key idea: Letting models “think harder” rather than just making them bigger.
Core Techniques for Inference-Time Scaling
Chain-of-Thought (CoT) prompting – guiding models to reason step by step.
Voting-based methods (Self-Consistency, Majority Voting) – leveraging multiple generations to improve answers.
Test-time scaling strategies – making LLMs more adaptable without retraining.
Emerging Techniques and Research Insights
“Wait” tokens & adaptive compute – dynamically adjusting processing power based on task complexity.
Preference optimization – refining responses using feedback loops at inference.
Dynamic depth scaling – allocating more compute to harder questions.
Practical Applications & Trade-offs
When to use inference-time scaling vs. pre-training improvements.
Computational costs and latency considerations.
Real-world use cases: Where these methods make the biggest impact.
Future of Smarter LLMs
The shift from static models to adaptive, reasoning-driven AI.
Open research questions and challenges ahead.
What’s next: Opportunities for improving efficiency and accuracy.

Prerequisites –

Familiarity with LLMs, prompt engineering, and basic to intermediate level AI/ML concepts will help attendees get the most out of this session.

Target Audience –

Intermediate

Rutvik Acharya

Applied ML Scientist with over 13 years of experience working on scalable and impact driven ML and LLM Solutions.

Nitin Agarwal

Data Science, ML, and AI | Ex-Microsoft | Mentor | Speaker

About Me
Principal Data Science and AI Leader with 13+ years of experience in building impactful intelligent solutions across various domains. Skilled in leading large-scale projects, guiding cross-functional teams, and fostering a culture of innovation and excellence.

Expertise in:
- Machine Learning, Natural Language Processing, and Generative AI
- Driving business growth and operational efficiency
- Enhancing customer satisfaction through cutting-edge AI solutions

Proven Track Record:
- Recognized for strong leadership and strategic thinking
- Passionate about mentoring future talent
- Frequent speaker and thought leader in the data science community
- Numerous accolades for innovation and excellence

Let's connect!
🔗 LinkedIn: https://www.linkedin.com/in/agnitin/

Compute-Scaling Methods at Inference Time .ical 2025-09-14 10:10–10:40, Track 1

Compute-Scaling Methods at Inference Time
.ical

2025-09-14 10:10–10:40, Track 1