An algorithm for unsupervised topic discovery from broadcast news stories

  • Authors:
  • Sreenivasa Sista;Richard Schwartz;Timothy R. Leek;John Makhoul

  • Affiliations:
  • Northeastern University, Boston, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA

  • Venue:
  • HLT '02 Proceedings of the second international conference on Human Language Technology Research
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The cost of annotating a large corpus with thousands of distinct topics is high. In addition, human annotators usually fail to indicate all of the relevant topics for each document. It would be desirable to determine the topics in any new domain or language automatically, given only a large corpus in that domain and language. We present an algorithm called Unsupervised Topic Discovery, which creates topics from a collection of news stories, provides a human understandable topic label and then assigns the topics to the news stories. Finally, we report our results on a collection of broadcast news stories in English and Arabic.