Last updated June 13, 2024
In AI News

Comprehensive RAG Benchmark Aims to Advance Retrieval-Augmented Question Answering

Meta AI Introduces New Dataset and Evaluation to Push Limits of Knowledge-Grounded Language Models

Share

Published on June 10, 2024

by Gopika Raj

Listen to this story

Researchers at Meta AI have created a new benchmark called CRAG (Comprehensive Retrieval-Augmented Generation Benchmark) to spur advancements in retrieval-augmented question answering systems that combine large language models with external knowledge sources.

The goal is to develop more reliable and trustworthy question answering capabilities that overcome hallucinations and knowledge gaps in today’s language models.

The CRAG benchmark consists of 4,409 question-answer pairs spanning finance, sports, music, movies, and general topics.

It includes diverse question types like comparisons, aggregations, multi-hop queries, and false premises. The dataset incorporates facts with varying dynamism from real-time to static, as well as varying entity popularity from head to long-tail.

Crucially, CRAG provides mock web search results and APIs to simulate retrieving information from the internet and knowledge graphs. This allows benchmarking the full pipeline of retrieval, synthesis, and generation required for knowledge-grounded question answering.

Evaluations highlighted major gaps in current systems. The most advanced language models achieved only 34% accuracy on CRAG, while straightforward retrieval-augmentation improved this to just 44%.

Even industry-leading retrieval-augmented systems answered only 63% of questions without hallucinations, struggling especially with dynamic, long-tail, and complex queries.

“CRAG reveals the challenges in building fully trustworthy question answering systems that can reliably incorporate information from the real world,” said Xiao Yang, a research scientist at Meta AI and co-lead of the project. “We hope this benchmark spurs innovation and allows tracking progress toward this critical goal.”

The CRAG dataset formed the basis for the KDD Cup 2024 challenge hosted by Meta AI, attracting thousands of participants working to advance retrieval-augmented generation capabilities. The researchers plan to continue expanding and improving CRAG to push forward research in this area.

📣 Want to advertise in AIM? Book here

Gopika Raj

With a Master's degree in Journalism & Mass Communication, Gopika Raj infuses her technical writing with a distinctive flair. Intrigued by advancements in AI technology and its future prospects, her writing offers a fresh perspective in the tech domain, captivating readers along the way.