Extracting significant time varying features from text

  • Authors:
  • Russell Swan;James Allan

  • Affiliations:
  • Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts;Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts

  • Venue:
  • Proceedings of the eighth international conference on Information and knowledge management
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a simple statistical model for the frequency of occurrence of features in a stream of text. Adoption of this model allows us to use classical significance tests to filter the stream for interesting events. We tested the model by building a system and running it on a news corpus. By a subjective evaluation, the system worked remarkably well: almost all of the groups of identified tokens corresponded to news stories and were appropriately placed in time. A preliminary objective evaluation was also used to measure the quality of the system and it showed some of the weaknesses and the power of our approach.