Wire Riot

paper

arXiv cs.CL

November 18th, 2025 at 5:00 AM

When Facts Change: Probing LLMs on Evolving Knowledge with evolveQA

arXiv:2510.19172v2 Announce Type: replace Abstract: LLMs often fail to handle temporal knowledge conflicts--contradictions arising when facts evolve over time within their training data. Existing studies evaluate this phenomenon through benchmarks built on structured knowledge bases like Wikidata, but they focus on widely-covered, easily-memorized popular entities and lack the dynamic structure needed to fairly evaluate LLMs with different knowledge cut-off dates. We introduce evolveQA, a benchmark specifically designed to evaluate LLMs on temporally evolving knowledge, constructed from 3 real-world, time-stamped corpora: AWS updates, Azure changes, and WHO disease outbreak reports. Our framework identifies naturally occurring knowledge evolution and generates questions with gold answers tailored to different LLM knowledge cut-off dates. Through extensive evaluation of 12 open and closed-source LLMs across 3 knowledge probing formats, we demonstrate significant performance drops of up to 31% on evolveQA compared to static knowledge questions.

#ai

#llm

#open_source

Open source

Score: 2.80

Engagement proxy: 0

Canonical link: https://arxiv.org/abs/2510.19172