Gemma from Scratch: Coding a Modern Large Language Model Layer by Layer

Stop treating LLMs like magic. In this hands-on tutorial, we build a clone of Google’s Gemma from scratch in PyTorch. You will code the core mechanics—RoPE, RMSNorm, and GQA—layer by layer. Leave with a working model that loads official weights. No magic, just code.

Large Language Models (LLMs) often feel like magic black boxes. We use them via APIs, but do we truly understand what happens inside? In this hands-on tutorial, we will demystify modern LLMs by building a functional clone of Google’s Gemma model from scratch using PyTorch.

Moving beyond basic transformer theory, we will implement the specific architectural components that power today’s state-of-the-art models. You will write the code for Rotary Positional Embeddings (RoPE), RMSNorm, Grouped-Query Attention (GQA), and SwiGLU activation functions.

By the end of this session, you will have a working model architecture that can load official pre-trained weights and generate text. This tutorial is designed for Python developers and data scientists who want to bridge the gap between high-level APIs and the low-level mechanics of deep learning. No magic, just code.

Gemma from Scratch: Coding a Modern Large Language Model Layer by Layer

Saturday, May 30

11:05 - 13:10

Luca Massaron