Topic activation analysis for document streams based on document arrival rate and relevance

  • Authors:
  • Chunhua Cui;Hiroyuki Kitagawa

  • Affiliations:
  • University of Tsukuba;University of Tsukuba

  • Venue:
  • Proceedings of the 2005 ACM symposium on Applied computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the advance of network technology in recent years, the dissemination and exchange of massive documents has become commonplace. Accordingly, the importance of content analysis techniques is increasing. Topic analysis in large-scale document streams such as E-mails and news articles is an important research issue. This paper addresses techniques for "topic activation analysis" for document streams. For example, when news articles with a strong relationship to a given topic arrive frequently in a news stream, we can regard the activation level of the topic as high. In [1], Kleinberg proposed a method for analyzing document streams. Although the main objective of his method was to detect bursts of topics, it can also be used for topic activation analysis. His method, however, has a serious limitation in that it only looks at the arrival rate of documents and ignores the degree of relevance for each document. Another limitation is that his method is "batch-oriented." This paper first proposes a novel topic activation analysis scheme that incorporates both document arrival rate and relevance to address the first problem. It then presents an incremental scheme more appropriate for a document streaming environment. The proposed schemes are validated by experiments using real CNN news articles.