Answerability-First RAG for Mixed Text and Tables

Most RAG failures are not retrieval errors but answerability problems. This talk presents an answerability-first, two-agent RAG architecture in Python that validates questions against document units, enabling semantic abstention and reducing hallucinations in mixed text–table reports.

In this talk, I present an answerability-first, two-agent RAG system for documents that mix text and tables, built in Python. The main idea is simple but often missing in RAG systems: instead of asking which document looks most similar to a question, the system asks which part of the document can really answer it.

This change is especially important for legal, compliance, and governance documents, where reports may look very different but are expected to answer the same repeated questions. In these cases, giving a reasonable-looking answer is not enough. The real challenge is to check whether the document truly contains the needed information, or whether no supported answer exists at all.

The talk shows how separating an offline step from an online decision step makes this possible. Text and tables are handled as equal units, and answers are based only on exact quotes taken from the document. At question time, a dedicated agent checks the user’s question against each document unit, text paragraphs and tables, and gives each one a clear answerability score. The agent does not write answers. It only decides whether a question can be answered from a unit, or not. When no unit supports the question, the system clearly refuses to guess.

Finally, the approach introduces an offline Question Graph that collects common, answerable question types over time. This allows validated questions to be reused, documents to be compared using the same information needs, and question answering to be more reliable and easy to check for documents that mix text and tables.

Answerability-First RAG for Mixed Text and Tables

Friday, May 29

11:40 - 12:25

Maria Vallarelli