Talk

Designing Rule-Driven Transformation Engine in PySpark (Without Losing Your Sanity)

Saturday, May 30

11:05 - 11:35
RoomSpaghetti
LanguageEnglish
Audience levelIntermediate
Elevator pitch

Config-driven transformations often turn PySpark notebooks into fragile monsters. This talk shows how to design a rule-based transformation architecture in Python that stays fast, testable, and maintainable—even as rules, formats, and edge cases keep growing.

Abstract

Rule-based data transformations look simple—until they aren’t. My first attempt was exactly what many of us try: Python loops, conditional logic, and quick fixes inside a PySpark notebook. It worked… until performance dropped, rules multiplied, and testing became nearly impossible.

This talk tells the story of how that initial approach failed, and what changed when I stopped treating transformation rules as code and started treating them as data.

We’ll explore how to design a rule-driven transformation architecture in PySpark where behavior is defined by external configuration, but implemented with explicit, testable logic. Along the way, I’ll share the trade-offs I had to face: when Spark-native expressions are enough, when UDFs are unavoidable, and how small architectural decisions can make a notebook either evolvable—or brittle.

You’ll see practical patterns for:

  • Modeling transformation rules outside the core logic
  • Applying dynamic mappings and aggregations safely
  • Keeping performance under control as complexity grows
  • Structuring notebook code so it can actually be tested

This is not a “do it my way” talk. It’s a reflection on mistakes, constraints, and design decisions that emerged from a real problem—and lessons that apply to many PySpark transformation workflows.

The talk is aimed at data engineers and developers with basic Spark experience who want to move beyond ad-hoc transformations toward more robust and maintainable designs.

TagsData Engineering
Participant

THOMAS SCARDONI

Once, my six-year-old daughter asked me what I do for a living. I replied that, just as recipes and creativity are needed in the kitchen to make tasty and balanced dishes, my job is to create new recipes to get computers to do extraordinary things for us. After all, isn’t that what developers do?

Applying this metaphor to the field of data, which has always been a keen interest of mine, what I love to do is try to make data accessible and understandable in a world where the exponential growth of data at our disposal risks making us lose our bearings.

At the same time, my curiosity about new technologies drives me to explore and experiment with innovative solutions, which I try to apply to the projects I work on, knowing full well that there is no progress without knowledge.

I love music, which I play mainly on the pipe organ or piano, and I enjoy reading popular science books, having always been fascinated by cosmology and how things work.