Preethi Srinivasan
Preethi has an MS (by Research) from IIT Mandi and her thesis focused on Computer Vision specifically medical image post-processing. She is the first author on publications at ACCV, WiML, and IEEE CBMS. At Sahaj, she has built ML prototypes for video understanding, LLM fine-tuning, and RAG-based QA systems. Authored a blog series on LoRA and Intrinsic Dimension, which led to speaking engagements at PyCon India 2024 and The Fifth Elephant 2025.
She
Speaker Tagline –Machine Learning | Efficient Training at Scale
Gravatar - Professional Photo – LinkedIn Profile – Twitter (X) Profile –Sessions
What does it take to move the process of training a neural network from a single device to multiple? The data interdependency and memory layout, which are handled easily in a simple, single-device scenario, need to now be moved into the distributed computing realm.
We'll take a simple network training example, and through first principles, introduce the basic primitives that, when supported by a distributed computing framework, enable spreading the computation over multiple nodes.
We'll cover the principles with the help of PyTorch. We'll use DDP/FSDP for describing the computation and introduce key fundamentals of the network primitives that enable DDP/FSDP.
Large pre-trained models are now the norm, making Parameter-Efficient Fine-Tuning techniques like LoRA essential to reduce computational and storage costs. But why do these methods work so well? This talk explores the theory of Intrinsic Dimension (ID)—the idea that neural networks often need far fewer effective directions to learn a task than their total parameters suggest.
We’ll estimate a task’s ID via random subspace training on an MLP for MNIST, reproducing results from foundational papers. Then, we’ll compare how LoRA approximates subspace training in compute, training time, and accuracy—clarifying key design trade-offs. LoRA succeeds not just from engineering but by exploiting the low-dimensional structure revealed by ID.
We also highlight PyTorch internals that enable flexible subspace training. This talk builds on a four-part blog series bridging theory and engineering.