Talk

Let LLMs wander - An introduction to Reinforcement Learning Environments

Saturday, May 30

15:35 - 16:20
RoomSpaghetti
LanguageEnglish
Audience levelIntermediate
Elevator pitch

What if, instead of learning only from examples, Language Models could explore crafted Environments, little worlds where they can act and improve autonomously?

Join me to see how Reinforcement Learning Environments work, how to build them, and how to use them to evaluate and train LLMs/Agents.

Abstract

Since the release of reasoning Language Models like DeepSeek R1, we’ve seen improvements in models’ capabilities: math, code, and long-running agentic tasks.

Part of this progress comes from Reinforcement Learning with Verifiable Rewards and Environments. While the classical training recipe focused on learning from examples, in this new and complementary paradigm, the LLM/Agent interacts with an Environment, takes actions, gets rewards, and learns from the process.

Fascinating, no? I think so, and I’ve spent some time exploring this space. In this talk, I will walk you through my journey into Environments from a practical point of view.

We’ll see together:

  • How classic Reinforcement Learning concepts translate to Language Models
  • Verifiers, an open-source library to build Environments as software artifacts
  • Concrete examples of environments, from single-turn tasks to multi-turn games and tool-using agents
  • Evaluation using Environments
  • Training Small Language Models with Environments

By the end, you’ll be able to start building your own RL environments, little worlds for LLMs. I’ll also share the joys, frustrations, and lessons learned along the way.

TagsML and AI, Applications and Libraries
Participant

Stefano Fiorucci

🧑‍🚀 AI Engineer/explorer. Passionate about Language Models, open source, and knowledge sharing.

👨‍💻 I work on the open-source Haystack LLM orchestration framework, contributing code, tutorials and demos.

🧭 Fascinated by all things LLMs. From inference and orchestration (Agents, RAG) to post-training techniques. I frequently experiment with training small Language Models and Reinforcement Learning. I like sharing what I learn.