PyCon India 2025

Optimising Deep Neural Inference for Edge Devices: Tools, Pipelines, and Techniq
2025-09-12 , Room 5

Deploying real-time AI models on embedded Linux platforms like Raspberry Pi, Jetson Nano, or Rockchip-based boards is a growing need in industries like manufacturing, healthcare, and automotive. However, the challenges are real: constrained computing, tight memory, and limited power. This hands-on workshop walks you through the full lifecycle—designing, optimising, cross-compiling, and deploying lightweight CNNS for inference at the edge.
Participants will start with a base CNN (e.g., MobileNet or ShuffleNet), apply model compression techniques like pruning and quantisation, and then learn how to build optimised deployment pipelines using TensorFlow Lite and PyTorch Mobile. We'll also touch upon using NPU accelerators and real-time profiling to hit performance targets. By the end, the audience would be able to deploy and benchmark a real model on an embedded Linux system.


Introduction - Motivation: Real-world use cases (smart cameras, Iot sensors, etc.)
- Overview of embedded platforms (Jetson, Rockchip, Pi)

Model Selection and Design - Lightweight CNNS: MobileNet, ShuffleNet, SqueezeNet
- Trade-offs: Accuracy vs. Latency vs. Size
- Hands-on: Load and inspect model performance

Model Optimisation Techniques - Quantisation (Post-training, Aware Training)
- Pruning & Knowledge Distillation
- Tools: PyTorch/TF model optimisation toolkits
- Hands-on: Apply optimisations to a CNN

Cross-Compiling & Toolchains - Toolchains for ARM (aarch64): arm-linux-gnueabihf, Buildroot
- Docker-based emulation
- Hands-on: Compile an optimised TFLite model for ARM

Runtime & Deployment Pipelines - Frameworks: TensorFlow Lite, PyTorch Mobile
- Hardware acceleration: Coral Edge TPU, NPU on RK3588, Jetson’s Tensorrt
- Hands-on: Deploy to device or emulated ARM board

Performance Benchmarking & Profiling - Tools: perf, htop, nvprof, TFLite Benchmark Tool
- Scheduling & resource management
- Hands-on: Profile and optimise inference latency

Wrap-up - Recap + Resources
- Where to go from here (Tinyml, ONNX Runtime, Automl Edge)

Case Study: Real-Time Object Identification for Drones (Jetson Nano / Raspberry Pi + NPU)

In many drone-based applications—such as agriculture, search and rescue, and surveillance—there’s a need for real-time object identification directly on the drone to avoid latency and connectivity issues associated with cloud inference.
Scenario
A lightweight drone is equipped with a Raspberry Pi 4B or Jetson Nano and a camera module. The goal is to identify specific objects on the ground—e.g., vehicles, people, or crops—using a real-time, optimised model running locally.

Problem
- The drone has constrained compute and battery capacity.
- The model must operate under real-time constraints (<50ms inference time).
- No internet connection during flight; the model must run fully offline.

Solution Covered in Workshop
- Participants will replicate a scaled-down version of this use case:
- Use MobileNetV2 or TinyYOLO for object detection.
- Apply quantisation-aware training to reduce model size and power usage.
- Deploy the model using TensorFlow Lite or ONNX Runtime with NPU acceleration (if hardware available).
- Profile latency and FPS during emulated inference.
- Use frame-skipping and scheduling strategies to meet power constraints.
Hands-on Highlights
- Load sample aerial imagery or simulated video feed.
-Test an optimised model on a Jetson Nano or Pi.
- Compare CPU-only vs. NPU-accelerated inference times.


Target Audience

Intermediate

Prerequisites

Prior knowledge of ML and object detection methods would be required

Specialises in computer vision, time-series forecasting, and scalable MLOps frameworks with over five years of experience in ML engineering.
Develops and deploys production-grade ML solutions in automotive, telematics, and clean-tech domains.
Leads real-time inferencing platform development and ADAS calibration model deployment at Lytx, scaling across 100,000+ devices.
Designed predictive maintenance and anomaly detection models at Nunam Technologies, building MLOps pipelines with Kubeflow, KServe, and MLflow on AWS.
Contributes to open-source projects such as MLPerf-Tiny and delivers technical talks at major ML conferences.