PyCon India 2025

Celery and kubernetes for a fast, scalable and robust workflow orchestration
2025-09-14 , Track 3

Can Celery do more than just background tasks? Absolutely.

In this session, I'll share a real-world use case where we transformed Celery into a full-fledged workflow orchestrator — handling serial and parallel task execution, task retries, dynamic routing, and queue management — all running at scale on Kubernetes.

Paired with K8s' elasticity and Celery's simplicity, this setup replaced a slower, costlier serverless architecture built on AWS Lambda and Step Functions. You’ll learn how Celery’s canvas patterns, task routing, and worker isolation let us orchestrate complex ML pipelines efficiently, and why it's still a powerful, underappreciated tool for scalable, production-grade systems.

If you're tired of serverless trade-offs and want more control without giving up scalability, this talk is for you.


Session Outline

1. The humble beginnings: Celery for background tasks

We’ll start by revisiting what Celery was originally designed for — lightweight background task processing in Django apps. From sending emails to exporting reports, it has quietly powered asynchronous workloads in Python for years.

2. Why background jobs weren’t enough: the need for orchestration

As systems evolved, so did complexity. We needed more than just fire-and-forget background jobs — we needed workflows. I’ll explain how this need led us to explore Celery beyond its original use case.

3. Celery as a workflow orchestrator: a real-world alternative to Airflow and Step Functions

We'll walk through how Celery became our production-grade orchestrator — orchestrating multi-stage pipelines that would typically be handled by tools like Airflow or AWS Step Functions. This shift allowed us to unify orchestration and task execution in one codebase.

4. Canvas workflows: chaining, grouping, and retrying tasks the Celery way

Celery’s canvas API offers a powerful set of tools for composing workflows: task chaining, grouping, nesting, automatic retries, named tasks, and even routing. These building blocks let us express complex business logic with clean, readable code.

5. Optimizing for scale: deploying Celery on Kubernetes

To handle real-world scale, we containerized our Celery workers and deployed them on Kubernetes. This gave us full control over resource allocation — CPU, RAM, and scaling policies — per worker type, tailored to the tasks they run.

6. Speeding things up: choosing the right concurrency model

Celery supports multiple worker types like prefork, gevent, and eventlet. We’ll break down how we selected the right worker model (CPU-bound vs I/O-bound) to optimize latency and throughput for different stages of our pipeline.

7. Real-time data and ML: pandas operations and model inference at scale

Our workflows included pandas-heavy data transformations and ML inference. Running these in Celery workers on Kubernetes let us cache models locally, improve performance, and reduce overhead — all while staying fully asynchronous.

8. Observability matters: logging, monitoring, and debugging

Celery offers pre-run and post-run hooks for injecting custom logic — perfect for adding instrumentation. We combined this with the EFK (Elasticsearch, Fluentd, Kibana) stack to gain deep insights into task behavior and failures across our cluster.

9. Cutting down cloud cost: goodbye Lambda, Step Functions, and CloudWatch

By moving off the serverless stack (AWS Lambda + Step Functions + CloudWatch), we saved significantly on cost while gaining greater control, faster execution times, and easier debugging in local and staging environments.

10. Cloud-agnostic deployment: freedom to run anywhere

Because our solution runs entirely on Kubernetes with open-source tools, it's fully cloud-agnostic. Whether you're on AWS, GCP, Azure, or even on-prem — this architecture scales with you, without vendor lock-in.

Conclusion: Celery — simple, powerful, production-grade

Celery is often underestimated as “just” a background task tool. But when paired with Kubernetes and used thoughtfully, it becomes a scalable, modular workflow engine. This talk is about pushing Python infrastructure further — proving that with the right setup, even traditional tools like Celery can scale with modern engineering demands.


Attendees will walk away with actionable insights, deployment strategies, and architectural patterns for using Celery and Kubernetes to build fast, cost-efficient, and scalable data or ML pipelines in production.


Prerequisites

Attendees should have a basic understanding of Python programming and asynchronous task processing concepts. Familiarity with Celery for background jobs and some experience with Kubernetes will help in grasping the deployment and scaling discussions. Prior exposure to workflow orchestration tools like Airflow or AWS Step Functions is a plus but not required.

Additional Resources

https://medium.com/itnext/building-a-production-grade-workflow-orchestrator-with-celery-ad2d48aa054d

https://medium.com/itnext/back-to-the-future-the-future-might-not-just-be-serverless-after-all-b8165d6e84c2

https://dev.to/akarshan/the-curious-case-of-celery-work-flows-39f7

Target Audience

Intermediate

Param Rajani is a Software Engineer at GoDaddy with around 3 years of experience working in startups as a Data Engineer and later promoted to Senior Data Engineer.
His expertise lies in building scalable workflows on both serverless and Kubernetes platforms, having spearheaded product deployments in these domains. Param has witnessed the best and worst of scaling challenges, designing systems on Kubernetes and AWS, optimizing existing architectures, and crafting maintainable codebases. He brings hands-on experience with Python, Kubernetes, and a wide range of AWS cloud services—all achieved at an early stage in his career.

Connect with Param on LinkedIn