Most RAG failures are not retrieval errors but answerability problems. This talk presents an answerability-first, two-agent RAG architecture in Python that validates questions against document units, enabling semantic abstention and reducing hallucinations in mixed text–table reports.
In this talk, I present an answerability-first, two-agent RAG system for documents that mix text and tables, built in Python. The main idea is simple but often missing in RAG systems: instead of asking which document looks most similar to a question, the system asks which part of the document can really answer it.
This change is especially important for legal, compliance, and governance documents, where reports may look very different but are expected to answer the same repeated questions. In these cases, giving a reasonable-looking answer is not enough. The real challenge is to check whether the document truly contains the needed information, or whether no supported answer exists at all.
The talk shows how separating an offline step from an online decision step makes this possible. Text and tables are handled as equal units, and answers are based only on exact quotes taken from the document. At question time, a dedicated agent checks the user’s question against each document unit, text paragraphs and tables, and gives each one a clear answerability score. The agent does not write answers. It only decides whether a question can be answered from a unit, or not. When no unit supports the question, the system clearly refuses to guess.
Finally, the approach introduces an offline Question Graph that collects common, answerable question types over time. This allows validated questions to be reused, documents to be compared using the same information needs, and question answering to be more reliable and easy to check for documents that mix text and tables.
I’m a Data Scientist working at Eni Plenitude. I build machine learning and data-driven systems, with a strong focus on reliable decision-making and real-world impact. I enjoy building and fixing ML pipelines, especially when the data is complex, imperfect, or comes from real documents, tables, and time series.
I’m very interested in energy systems and energy communities, and in how data and analytics can help use renewable energy in a better and more efficient way. What motivates me most is the challenge of exploring complex problems and figuring out how systems really work. Outside of work, I love playing the guitar, playing tennis, and quietly observing nature, which often inspires me and makes me want to capture it in a photo.