TimeStress
TimeStress is a dataset designed to evaluate the temporal representation of facts in large language models (LLMs) by assessing their ability to distinguish between correct and incorrect factual statements contextualized with a date and formatted as questions, such as “In 2011, who was the president of the USA? Barack Obama”. The evaluation principle is that […]