Automatic text processing
On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improving text categorization methods for event tracking
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the bursty evolution of blogspace
WWW '03 Proceedings of the 12th international conference on World Wide Web
Online Data Mining for Co-Evolving Time Sequences
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Topic analysis using a finite mixture model
Information Processing and Management: an International Journal
Research intelligence involving information retrieval - An example of conferences and journals
Expert Systems with Applications: An International Journal
Indices of novelty for emerging topic detection
Information Processing and Management: an International Journal
Hi-index | 0.00 |
With the advance of network technology in recent years, the dissemination and exchange of massive documents has become commonplace. Accordingly, the importance of content analysis techniques is increasing. Topic analysis in large-scale document streams such as E-mails and news articles is an important research issue. This paper addresses techniques for "topic activation analysis" for document streams. For example, when news articles with a strong relationship to a given topic arrive frequently in a news stream, we can regard the activation level of the topic as high. In [1], Kleinberg proposed a method for analyzing document streams. Although the main objective of his method was to detect bursts of topics, it can also be used for topic activation analysis. His method, however, has a serious limitation in that it only looks at the arrival rate of documents and ignores the degree of relevance for each document. Another limitation is that his method is "batch-oriented." This paper first proposes a novel topic activation analysis scheme that incorporates both document arrival rate and relevance to address the first problem. It then presents an incremental scheme more appropriate for a document streaming environment. The proposed schemes are validated by experiments using real CNN news articles.