Automatic dating of documents and temporal text classification

Authors:
Angelo Dalli;Yorick Wilks
Affiliations:
University of Sheffield, United Kingdom;University of Sheffield, United Kingdom
Venue:
ARTE '06 Proceedings of the Workshop on Annotating and Reasoning about Time and Events
Year:
2006

Citing 9
Cited 1

Time series: theory and methods

Time series: theory and methods
Time Series Analysis, Forecasting and Control

Time Series Analysis, Forecasting and Control
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Data Mining and Knowledge Discovery
A Retrieval Language for Historical Documents

DEXA '98 Proceedings of the 9th International Conference on Database and Expert Systems Applications
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Inferring temporal ordering of events in news

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Robust temporal processing of news

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
TimeML-compliant text analysis for temporal reasoning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Temporal context representation and reasoning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Labeling documents with timestamps: learning from their time expressions

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

The frequency of occurrence of words in natural languages exhibits a periodic and a non-periodic component when analysed as a time series. This work presents an unsupervised method of extracting periodicity information from text, enabling time series creation and filtering to be used in the creation of sophisticated language models that can discern between repetitive trends and non-repetitive writing patterns. The algorithm performs in O(n log n) time for input of length n. The temporal language model is used to create rules based on temporal-word associations inferred from the time series. The rules are used to guess automatically at likely document creation dates, based on the assumption that natural languages have unique signatures of changing word distributions over time. Experimental results on news items spanning a nine year period show that the proposed method and algorithms are accurate in discovering periodicity patterns and in dating documents automatically solely from their content.