WikiWars: a new corpus for research on temporal expressions

Authors:
Pawet Mazur;Robert Dale
Affiliations:
Wrocław University of Technology, Wrocław, Poland and Macquarie University, NSW, Sydney, Australia;Macquarie University, NSW, Sydney, Australia
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 6
Cited 5

Extracting meaning from temporal nouns and temporal prepositions

ACM Transactions on Asian Language Information Processing (TALIP) - Special Issue on Temporal Information Processing
Robust temporal processing of news

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
From Language to Time: A Temporal Expression Anchorer

TIME '06 Proceedings of the Thirteenth International Symposium on Temporal Representation and Reasoning
Learning event durations from event descriptions

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
What's the date?: high accuracy interpretation of weekday names

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic time expression labeling for english and chinese text

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Semantics of calendar adverbials for information retrieval

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Supervised language modeling for temporal resolution of texts

Proceedings of the 20th ACM international conference on Information and knowledge management
Tracking entities in web archives: the LAWA project

Proceedings of the 21st international conference companion on World Wide Web
Event-centric search and exploration in document collections

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data sets that support the development of techniques for event time-stamping and tracking the progression of time through a narrative. In this paper, we present a new corpus of temporally-rich documents sourced from English Wikipedia, which we have annotated with TIMEX2 tags. The corpus contains around 120000 tokens, and 2600 TIMEX2 expressions, thus comparing favourably in size to other existing corpora used in these areas. We describe the preparation of the corpus, and compare the profile of the data with other existing temporally annotated corpora. We also report the results obtained when we use DANTE, our temporal expression tagger, to process this corpus, and point to where further work is required. The corpus is publicly available for research purposes.