PyCon India 2025

Shalini Harkar

I am an distinguished AI specialist and advocate based in Bengaluru, India, with over 12 years of experience in AI/ ML and product innovation. Currently, I am engaged as an Lead AI Advocate at IBM, where I help the adoption of AI technologies, including IBM's Granite models and WatsonX platform, across various industries. My role involves delivering technical content, promoting product adeptness, and offering insights on AI trends to influence strategies and solutions aligned with organizational goals. I am also pursuing PhD in Applied AI from IIITA.


Professional Link

https://www.ibm.com/think/author/shalini-harkar

Preferred Pronoun

She/Her

Speaker Tagline

Lead AI advocate

Gravatar - Professional Photo

https://gravatar.com/profile/onboarding/verified-accounts/linkedin

LinkedIn Profile

https://www.linkedin.com/in/shalini-harkar-7140284a/


Session

09-12
14:00
180min
Preparing Data for LLM Applications Using Data Prep Kit
Shalini Harkar

When developing large language model (LLM) applications, data preparation is far and away the most crucial and usually overlooked stage in the development process. Training data quality, its structure, and alignment are the most crucial factors to ensure model performance. Without spending the proper time at the outset to get the data preparation right, leads to poor workflows, prolonged time to develop the application, and develop models which do not meet your performance expectations.
In this workshop, participants will work with real-world problems pertaining to issues like duplication, variability, and noise in big data. Using the open-source Data Prep Kit (DPK), we will transform data by showing you how to clean, deduplicate, and structure data for LLM tasks, including how to build a Retrieval-Augmented Generation (RAG) chatbot. Each participant will leave the workshop with tangible experience and reusable workflows to accelerate development opportunities and better outcomes.

AI, ML, Data Science
Room 7