Dynamic Language Understanding for Evaluating Question Answering Models

A new report from Google’s DeepMind outlines a new method of evaluating how AIs understand human language. DeepMind researchers, led by Elena Gribovskaya and Tomáš Kočiský, presented their findings in a paper that has interesting implications for AI research.

One kind of useful AI is a bot that can read and ‘understand’ human language in the form of written information. These types of bots help us find the information we need by scanning the vast amounts of information available online and selecting the most relevant and useful answers. We see this kind of work performed by web search algorithms and virtual assistants.

These kinds of AIs (referred to as ‘language models’) are typically evaluated for knowledge and language understanding via question answering (QA), i.e., they answer questions based on a piece of information, such as a Wikipedia or news article. However, knowledge does not remain static over time, rather, it grows and evolves with additions and revisions, so the knowledge of these bots becomes outdated.

DeepMind constructed a new large-scale dataset, called StreamingQA, to test how models could adapt to evolving knowledge. It used human-composed questions and gave the AI 14 years of time-stamped news articles to draw information from for its answers. The models were evaluated on a quarterly basis as they read new articles and updated their knowledge base.

The results showed that both parametric and semi-parametric models benefitted in accuracy from StreamingQA, being able to adapt to new information without full retraining. It was particularly useful for high-frequency topics that continue to be relevant over long periods of time. In a world of rapidly-updating, ever-expanding information where we increasingly rely on technology to sort signals from noise, StreamingQA and methods like it could help our virtual assistants and web searches be more effective.

To read the original news article, click here.

Dynamic Language Understanding for Evaluating Question Answering Models

Luis Paradela