Workshop

Picasso-Grade Scraping 🎨📊: From Online Art Galleries to Production-Ready Data Pipelines

Friday, May 29

11:00 - 13:00
RoomPiadina
LanguageEnglish
Audience levelIntermediate
Elevator pitch

Let’s scrape a real online art gallery 🎨 and turn chaotic artwork pages into analytics-ready data. We’ll build a modern Python scraper 🐍, tame messy size strings with 🧠 spaCy, and ship 🚢 everything into SQL for insights 📊.

Abstract

Online art galleries are great for humans and a nightmare for machines 😅: every artwork page is a bespoke little chaos of titles, prices and wildly inconsistent size strings. In this talk, we’ll take a real gallery (Artmajeur) and turn it into a clean, analytics-ready dataset — step by step — with modern Python tooling. 🎨🐍📊

We’ll start from an empty repo 🧱 and design a production-minded scraper: picking a modern scraping framework ⚙️, wiring up a single spider 🕷️, and hunting in DevTools 🔍 for the selectors that lead us from listing pages to artwork detail pages. From there, we’ll shape a clear data model — artist, title, price, raw size text, image URL — and build a small but realistic pipeline around it.

Then comes the fun part 🤹‍♀️: we’ll plug in a spaCy-powered NLP layer 🧠 to tame messy dimension strings and normalize them into consistent width_cm, height_cm, depth_cm, orientation, and size buckets — ready for SQL and dashboards.

You’ll walk away with a practical blueprint 🗺️ for going from “pretty website I’d like to scrape” to a robust, Picasso-Grade Python data pipeline that your future self (and your analysts) will quietly thank you for. 💫

TagsScaling, Web Frameworks, Data Engineering
Participant

Viktor Zagranovskyy

I’m a Python engineer and data platform lead working on large-scale web scraping and AI-powered extraction pipelines for e-commerce and retail. Every day me and my team battle the messy HTML, bot protection and LLMs to turn millions of product pages into reliable, analytics-ready data. I live in Italy and, outside of work, I am a father. I enjoy good D&D games, and plan to re-read the whole Pratchett’s Discoworld series when I retire.