Unified analysis of streaming news

Authors:
Amr Ahmed;Qirong Ho;Jacob Eisenstein;Eric Xing;Alexander J. Smola;Choon Hui Teo
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo!~Research, Santa Clara, CA, USA
Venue:
Proceedings of the 20th international conference on World wide web
Year:
2011

Citing 15
Cited 14

First story detection in TDT is hard

Proceedings of the ninth international conference on Information and knowledge management
Topic Detection and Tracking: Event-Based Information Organization

Topic Detection and Tracking: Event-Based Information Organization
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Latent dirichlet allocation

The Journal of Machine Learning Research
Text classification and named entities for new event detection

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical entity-topic models

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Aggregating inconsistent information: Ranking and clustering

Journal of the ACM (JACM)
Accounting for burstiness in topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient methods for topic model inference on streaming document collections

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Streaming first story detection with application to Twitter

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Resolving surface forms to Wikipedia topics

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Trains of thought: generating information maps

Proceedings of the 21st international conference on World Wide Web
Learning causality for news events prediction

Proceedings of the 21st international conference on World Wide Web
MAQSA: a system for social analytics on news

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Exploiting temporal topic models in social media retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Using paraphrases for improving first story detection in news and Twitter

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
TUT: a statistical model for detecting trends, topics and user interests in social media

Proceedings of the 21st ACM international conference on Information and knowledge management
Mining the web to predict future events

Proceedings of the sixth ACM international conference on Web search and data mining
An automated multiscale map of conversations: mothers and matters

SocInfo'12 Proceedings of the 4th international conference on Social Informatics
Learning to predict from textual data

Journal of Artificial Intelligence Research
"Metro maps of information" by Dafna Shahaf, Carlos Guestrin and Eric Horvitz, with Ching-man Au Yeung as coordinator

ACM SIGWEB Newsletter
Mining evolutionary multi-branch trees from text streams

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical geographical modeling of user locations from social media posts

Proceedings of the 22nd international conference on World Wide Web
NIFTY: a system for large scale information flow tracking and clustering

Proceedings of the 22nd international conference on World Wide Web
Scalable dynamic nonparametric Bayesian models of content and users

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

News clustering, categorization and analysis are key components of any news portal. They require algorithms capable of dealing with dynamic data to cluster, interpret and to temporally aggregate news articles. These three tasks are often solved separately. In this paper we present a unified framework to group incoming news articles into temporary but tightly-focused storylines, to identify prevalent topics and key entities within these stories, and to reveal the temporal structure of stories as they evolve. We achieve this by building a hybrid clustering and topic model. To deal with the available wealth of data we build an efficient parallel inference algorithm by sequential Monte Carlo estimation. Time and memory costs are nearly constant in the length of the history, and the approach scales to hundreds of thousands of documents. We demonstrate the efficiency and accuracy on the publicly available TDT dataset and data of a major internet news site.