No, you can’t ‘eval’ your way to fairness

No, your fairness eval library isn’t making your system fair. Fairness is felt, not calculated. A talk for people who suspect we can’t optimise our way to human dignity.

Cold open Fairness is fundamentally not tractable to classic optimisation techniques.

The part where I lose half the room Fairness is not a state of the world, it’s an experience of it. No technology is fair in a vacuum. Fairness can only be understood when a technical system collides with humans in the world. It is felt as much as it is calculated. We can look at statistical results in aggregate to understand patterns, but these do not tell the story of the individual.

Further, attempting to optimise numerical fairness metrics is fundamentally coercive and technocratic: putting our thumb on the scale globally, injecting “positive bias” into single dimensions, framing fairness as a data problem rather than a problem of human dignity. It fails to acknowledge differences in preference, culture, experience. To build systems that support human agency we must first abandon our idea of a single moral machine which consistently outputs the “right” answer from inputs + algorithms. I’d argue any system treating people as fungible or undifferentiated is structurally unfair. Instead we need to prioritise transparency, explanation, and empowerment.

What does this have to do with evals? We’re seeing a wave of off-the-shelf libraries measuring bad behaviours in LLM outputs, often simplifications of older fairness metrics. And yes, they can catch obvious failure modes like slurs in outputs. But this is one failure mode among many. Installing a library and calling the job done is fairness washing. The harder, more fruitful approach is to explore the space of failure modes, consider what an ideal world would look like, and design measures, mitigations, and feedback loops accordingly. It also means grappling with the fact that we cannot avoid doing harm. What we can do is harm reduction, humility, and striving toward something better while acknowledging the impossibility of the task.

Third act This talk won’t offer easy answers. Attend if you want to grapple with the gnarly problems of building systems for humans. We’ll borrow ideas from Design Justice and the disability rights movements: Nothing about us without us. Let’s ask and answer better questions. You’ll leave with sharper mental models and tools for the next tricky conversation at work.

No, you can’t ‘eval’ your way to fairness

Saturday, May 30

15:35 - 16:20

Laura Summers