Rutvik Acharya
Applied ML Scientist with over 13 years of experience working on scalable and impact driven ML and LLM Solutions.
Session
Large language models (LLMs) don’t always need to be bigger to be better—sometimes, they just need to think more efficiently. Inference-time compute scaling improves LLM reasoning by dynamically increasing computational effort during inference, similar to how humans perform better when given more time to think. This session will explore cutting-edge techniques like Chain-of-Thought (CoT) prompting, voting-based search, and the latest research in test-time scaling. We’ll dive into methods like “Wait” tokens, preference optimization, and dynamic depth scaling, which allow models to refine responses on the fly. Whether you're interested in boosting LLM accuracy, improving robustness, or optimizing compute budgets, this talk will provide key insights into the future of smarter, more adaptable AI.