State of In-Browser ML: WebAssembly, WebGPU, and the Modern Stack

In-Browser ML is no longer a toy. You can run Python in WebAssembly, ship interactive demos as a URL, and do on-device ML and LLM inference with no installs, servers, or data leaks. This talk maps what’s possible today, and the real constraints you must design around.

Over the last few years, the tooling has matured enough to make “ML in a tab” worth taking seriously. Today, you can execute Python code in a sandboxed environment, ship interactive demos as a single URL, and even run LLM inference entirely on-device, without installations, servers, or sending data anywhere. In this talk, we will give a practical overview of the current in-browser ML stack, focusing on what is realistically possible today and the practical limits you still have to design around.

We will start with interactive environments such as JupyterLite and explain how they work under the hood via Pyodide: what it means to run CPython compiled to WebAssembly, how the filesystem and networking model differ from “normal” Python, and what that implies for performance, I/O, and package support.

We will then move from notebooks to applications with PyScript, showing how the same building blocks can be used to create shareable browser-based tools. We will also briefly cover the lower-level approach: using Pyodide directly and orchestrating it with JavaScript for granular control over loading, packaging, and data interchange.

Finally, we will cover in-browser inference workflows for both traditional and deep learning models (via ONNX), and LLMs (via wllama and WebLLM), and discuss how WebGPU can accelerate these pipelines.

By the end of the talk, you’ll have a clear overview of the in-browser ML ecosystem and the practical intuition to decide whether it’s the right choice for your next project.

Outline:

Introduction + Motivating examples [4 min]
Running python in WebAssembly [6 min] (a) Introduction to Pyodide [2 min] (b) Package management [3 min] (c) Runtime + memory constraints [1 min]
Overview of interactive dev environments / JupyterLite [4 min]
Developing applications with PyScript + Native Pyodide bindings [7 min]
On-device ML Inference using ONNX/WebGPU/WebLLM/wllama [5 min]
Q&A [4 min]

Target Audience: This talk can be relevant for a broad audience. However, at least intermediate knowledge of ML / familiarity with Python ML ecosystem is required.

State of In-Browser ML: WebAssembly, WebGPU, and the Modern Stack

Saturday, May 30

11:45 - 12:15

Oleh Kostromin