Automatic online news topic ranking using media focus and user attention based on aging theory
Proceedings of the 17th ACM conference on Information and knowledge management
An Automatic Online News Topic Keyphrase Extraction System
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
On burstiness-aware search for document sequences
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Emerging topic detection on Twitter based on temporal and social terms evaluation
Proceedings of the Tenth International Workshop on Multimedia Data Mining
Trajectory-based visualization of web video topics
Proceedings of the international conference on Multimedia
Event detection with spatial latent Dirichlet allocation
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Analyzing word frequencies in large text corpora using inter-arrival times and bootstrapping
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
On the spatiotemporal burstiness of terms
Proceedings of the VLDB Endowment
A novel burst-based text representation model for scalable event detection
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Detecting real-time burst topics in microblog streams: how sentiment can help
Proceedings of the 22nd international conference on World Wide Web companion
Personalized emerging topic detection based on a term aging model
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
A Graph Analytical Approach for Topic Detection
ACM Transactions on Internet Technology (TOIT)
Hi-index | 0.00 |
Specialists who analyze online news have a hard time separating the wheat from the chaff. Moreover, automatic data-mining techniques like clustering of news streams into topical groups can fully recover the underlying true class labels of data if and only if all classes are well separated. In reality, especially for news streams, this is clearly not the case. The question to ask is thus this: if we cannot recover the full C classes by clustering, what is the largest K \le C clusters we can find that best resemble the K underlying classes? Using the intuition that bursty topics are more likely to correspond to important events that are of interest to analysts, we propose several new bursty vector space models (B-VSM) for representing a news document. B-VSM takes into account the burstiness (across the full corpus and whole duration) of each constituent word in a document at the time of publication. We benchmarked our B-VSM against the classical TFIDF-VSM on the task of clustering a collection of news stream articles with known topic labels. Experimental results show that B-VSM was able to find the burstiest clusters/topics. Further, it also significantly improved the recall and precision for the top K clusters/topics.