Running one model in production is easy—running many isn’t.
At Cassandra, our Django app struggled to manage multiple Bayesian models with messy, fragile scripts. We solved it with Fleets: self-contained workflows that run in isolation, with Django just orchestrating a tiny generated script.
In this talk, I’ll show how we cut complexity from 150 lines to 20—and how you can scale complex workloads the same way.
Running statistical models in production sounds straightforward — until you have five of them, each with different dependencies, different runtimes, and different resource appetites. The gap between “a model that works in a notebook” and “a model that runs reliably at scale” is one of the most common engineering bottlenecks in the industry, and one of the least discussed. At Cassandra, we build Marketing Mix Models powered by Bayesian inference. For a long time, our Django monolith managed them through 150+ lines of hand-crafted scripts. Every new model meant manual developer work. Every new feature meant more drift.
So we built Fleets — a pattern where each computational workflow is a self-contained sequence of steps (data retrieval, preprocessing, model fitting, output) encapsulated behind a single callable entry point. Django generates the fleet script, dispatches it to an isolated runtime, and receives the result via callback. The fleet handles everything else.
In this talk, we’ll walk through the architectural decisions behind this pattern — how Python design principles (Template Method, Strategy, Facade, thin orchestration) took us from 150+ lines of fragile logic to a ~20-line generated script — and why the same approach applies to any Django project that needs to orchestrate complex, isolated workloads at scale.
You’ll leave with a reusable architectural pattern — not a product pitch.