First story detection in TDT is hard
Proceedings of the ninth international conference on Information and knowledge management
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Journal of Machine Learning Research
Text classification and named entities for new event detection
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
ICML '06 Proceedings of the 23rd international conference on Machine learning
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical entity-topic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Aggregating inconsistent information: Ranking and clustering
Journal of the ACM (JACM)
Accounting for burstiness in topic models
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient methods for topic model inference on streaming document collections
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Streaming first story detection with application to Twitter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Resolving surface forms to Wikipedia topics
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Trains of thought: generating information maps
Proceedings of the 21st international conference on World Wide Web
Learning causality for news events prediction
Proceedings of the 21st international conference on World Wide Web
MAQSA: a system for social analytics on news
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Exploiting temporal topic models in social media retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using paraphrases for improving first story detection in news and Twitter
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
TUT: a statistical model for detecting trends, topics and user interests in social media
Proceedings of the 21st ACM international conference on Information and knowledge management
Mining the web to predict future events
Proceedings of the sixth ACM international conference on Web search and data mining
An automated multiscale map of conversations: mothers and matters
SocInfo'12 Proceedings of the 4th international conference on Social Informatics
Learning to predict from textual data
Journal of Artificial Intelligence Research
Mining evolutionary multi-branch trees from text streams
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical geographical modeling of user locations from social media posts
Proceedings of the 22nd international conference on World Wide Web
NIFTY: a system for large scale information flow tracking and clustering
Proceedings of the 22nd international conference on World Wide Web
Scalable dynamic nonparametric Bayesian models of content and users
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
News clustering, categorization and analysis are key components of any news portal. They require algorithms capable of dealing with dynamic data to cluster, interpret and to temporally aggregate news articles. These three tasks are often solved separately. In this paper we present a unified framework to group incoming news articles into temporary but tightly-focused storylines, to identify prevalent topics and key entities within these stories, and to reveal the temporal structure of stories as they evolve. We achieve this by building a hybrid clustering and topic model. To deal with the available wealth of data we build an efficient parallel inference algorithm by sequential Monte Carlo estimation. Time and memory costs are nearly constant in the length of the history, and the approach scales to hundreds of thousands of documents. We demonstrate the efficiency and accuracy on the publicly available TDT dataset and data of a major internet news site.