Using Burstiness to Improve Clustering of Topics in News Streams

Authors:
Qi He;Kuiyu Chang;Ee-Peng Lim
Affiliations:
-;-;-
Venue:
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Year:
2007

Citing 0
Cited 12

Automatic online news topic ranking using media focus and user attention based on aging theory

Proceedings of the 17th ACM conference on Information and knowledge management
An Automatic Online News Topic Keyphrase Extraction System

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
On burstiness-aware search for document sequences

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Emerging topic detection on Twitter based on temporal and social terms evaluation

Proceedings of the Tenth International Workshop on Multimedia Data Mining
Trajectory-based visualization of web video topics

Proceedings of the international conference on Multimedia
Event detection with spatial latent Dirichlet allocation

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Analyzing word frequencies in large text corpora using inter-arrival times and bootstrapping

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
On the spatiotemporal burstiness of terms

Proceedings of the VLDB Endowment
A novel burst-based text representation model for scalable event detection

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Detecting real-time burst topics in microblog streams: how sentiment can help

Proceedings of the 22nd international conference on World Wide Web companion
Personalized emerging topic detection based on a term aging model

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
A Graph Analytical Approach for Topic Detection

ACM Transactions on Internet Technology (TOIT)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Specialists who analyze online news have a hard time separating the wheat from the chaff. Moreover, automatic data-mining techniques like clustering of news streams into topical groups can fully recover the underlying true class labels of data if and only if all classes are well separated. In reality, especially for news streams, this is clearly not the case. The question to ask is thus this: if we cannot recover the full C classes by clustering, what is the largest K \le C clusters we can find that best resemble the K underlying classes? Using the intuition that bursty topics are more likely to correspond to important events that are of interest to analysts, we propose several new bursty vector space models (B-VSM) for representing a news document. B-VSM takes into account the burstiness (across the full corpus and whole duration) of each constituent word in a document at the time of publication. We benchmarked our B-VSM against the classical TFIDF-VSM on the task of clustering a collection of news stream articles with known topic labels. Experimental results show that B-VSM was able to find the burstiest clusters/topics. Further, it also significantly improved the recall and precision for the top K clusters/topics.