2025-09-14 –, Track 1
Python's GIL (Global Interpreter Lock), while promising thread safety within applications, can be a limiting factor when considering the need for truly parallel CPU bound tasks without the overhead of multiprocessing
or only the I/O concurrency which is provided by asyncio
.
In this talk we will discuss more about how we can bypass the GIL to improve performance and writing custom routines with Cython and the powerful nogil
directive to identify and resolve performance bottlenecks.
As the culmination of this talk, we will explore how we can use Cython and nogil
to implement an SGD (Stochastic Gradient Descent, an algorithm crucial to Deep Learning models) and benchmark it aganist pure Python implementation and see the performance gains.
- Introduction - Introduce myself and basically cover the topic and summary of the talk and what I hope to achieve by the end of the talk.
- Introduce the GIL :
- What is GIL? Why does it exist?
- The impact of execution on GIL and talk about CPython specific implementation. Might also cover other implementations like Jython or IronPython
- Why it is not suitable for certain implementations and discussion around existing methods of mitigating it -
multiprocessing
andasyncio
- Demonstrate an example of GIL limitation with a simple python script
-
Advanced Mitigation strategies - talk about the different advanced strategies we can adopt to mitigate GIL limitations, like using Cython with nogil and writing custom GIL releasing extensions in Rust/C++/C etc.
*Cython withnogil
- this will be the primary topic of the talk. We'll go deep into the details of Cython.- What is Cython, what does it do?
- What is
nogil
directive? - A quick demo of how Cython works.
- Talk of extensibility of Cython and how it can work with
openmp
especially with thenogil
directive. - Talk about SGD (Stochastic Gradient Descent), where it is used and how we will be using it as an example for parallel processing.
- Show an example of SGD with python implementation, highlighting the requirement of parallel processing. Benchmark the time here.
- Show an example of SGD with Cython implementation, showing the improved performance due to the
nogil
directive.
-
Conclusion - Conclude the talk and summarize all that we've discussed in the talk so far, and open for Q&A
Basic understanding of multi-threading and multi-processing should suffice.
Additional Resources – Target Audience –Intermediate
I'm a Software Engineer with close to 6 years of experience, currently working at Autodesk where I design and implement high throughput data pipelines to ingest and process petabytes of data. I love to have engineering discussions around performance tuning, and optimization patterns for languages like Python which are often deemed as just "scripting languages". In my off-time, I go trekking, watch anime, collect watches and play videogames.