With the GPU boom and CUDA a high barrier to entry, Python has become the practical bridge between developer productivity and performance. This workshop teaches hardware-aware GPU programming for Python developers, with a focus on how performance is less about model code and more about data movement
This is a fully hands-on workshop focused on writing your first high-performance GPU kernel in Python. Instead of starting with APIs, participants will begin by benchmarking a naive Python GPU kernel and observing why it fails to scale. From there, each section introduces a single hardware concept, memory hierarchy, arithmetic intensity, tiling, and fusion followed immediately by a coding exercise that applies it.
Participants will progressively transform slow Python kernels into efficient Triton implementations, learning how Python is lowered into PTX and how respecting GPU hardware constraints enables near-CUDA performance. Every concept is reinforced through code, measurement, and performance comparison.
Here is workshop flow:
Environment Warm-up & Baseline
Why Your GPU Code Is Slow
The Hardware You’re Actually Programming
Decide Before You Optimize: Roofline
Hit the Memory Wall
Hardware-Aware Patterns That Work
Your First Fast Python GPU Kernel
Kill the Memory Wall & Wrap-Up
By the end of the workshop, participants will be able to:
I am Abhik Sarkar, a machine learning engineer focused on building real-world computer vision systems that actually run at scale. My work lives at the intersection of software engineering, GPU hardware, and production reliability .
I currently lead machine learning at Cloudastructure, where I design end-to-end vision pipelines spanning high-throughput video ingestion, GPU-accelerated decoding, and low-latency inference. My daily toolset includes PyTorch, TensorRT, NumPy, OpenCV, CuPy, PyCUDA, and ONNX Runtime.
Outside of work, I actively seek technical discussions and regularly attend conferences to understand how engineers around the world approach hard problems. I care as much about learning as I do about sharing, and I make a deliberate effort to pass on whatever I know in a form others can actually use.
In my free time, I cook, and I make chocolate bars from raw cacao beans.