Detecting events in a million New York times articles

  • Authors:
  • Tristan Snowsill;Ilias Flaounas;Tijl De Bie;Nello Cristianini

  • Affiliations:
  • Department of Engineering Mathematics, University of Bristol;Department of Computer Science, University of Bristol;Department of Engineering Mathematics, University of Bristol;Department of Engineering Mathematics, University of Bristol and Department of Computer Science, University of Bristol

  • Venue:
  • ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a demonstration of a newly developed text stream event detection method on over a million articles from the New York Times corpus. The event detection is designed to operate in a predominantly on-line fashion, reporting new events within a specified timeframe. The event detection is achieved by detecting significant changes in the statistical properties of the text where those properties are efficiently stored and updated in a suffix tree. This particular demonstration shows how our method is effective at discovering both short- and long-term events (which are often denoted topics), and how it automatically copes with topic drift on a corpus of 1 035 263 articles.