Let LLMs wander - An introduction to Reinforcement Learning Environments

What if, instead of learning only from examples, Language Models could explore crafted Environments, little worlds where they can act and improve autonomously?

Join me to see how Reinforcement Learning Environments work, how to build them, and how to use them to evaluate and train LLMs/Agents.

Since the release of reasoning Language Models like DeepSeek R1, we’ve seen improvements in models’ capabilities: math, code, and long-running agentic tasks.

Part of this progress comes from Reinforcement Learning with Verifiable Rewards and Environments. While the classical training recipe focused on learning from examples, in this new and complementary paradigm, the LLM/Agent interacts with an Environment, takes actions, gets rewards, and learns from the process.

Fascinating, no? I think so, and I’ve spent some time exploring this space. In this talk, I will walk you through my journey into Environments from a practical point of view.

We’ll see together:

How classic Reinforcement Learning concepts translate to Language Models
Verifiers, an open-source library to build Environments as software artifacts
Concrete examples of environments, from single-turn tasks to multi-turn games and tool-using agents
Evaluation using Environments
Training Small Language Models with Environments

By the end, you’ll be able to start building your own RL environments, little worlds for LLMs. I’ll also share the joys, frustrations, and lessons learned along the way.

Let LLMs wander - An introduction to Reinforcement Learning Environments

Saturday, May 30

15:35 - 16:20

Stefano Fiorucci