Joint topic modeling for event summarization across news and social media streams

Authors:
Wei Gao;Peng Li;Kareem Darwish
Affiliations:
Qatar Computing Research Institute, Doha, Qatar;Shanghai Jiaotong University, Shanghai, China;Qatar Computing Research Institute, Doha, Qatar
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 17
Cited 1

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Bayesian query-focused summarization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A generalized Co-HITS algorithm and its application to bipartite graphs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Contrastive summarization: an experiment with consumer reviews

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Generating comparative summaries of contradictory opinions in text

Proceedings of the 18th ACM conference on Information and knowledge management
Cross-cultural analysis of blogs and forums with mixed-collection topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
TwitterRank: finding topic-sensitive influential twitterers

Proceedings of the third ACM international conference on Web search and data mining
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research
Summarizing contrastive viewpoints in opinionated text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Comparing twitter and traditional media using topic models

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Social context summarization

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

IEEE Transactions on Pattern Analysis and Machine Intelligence

Towards automatic assessment of the social media impact of news content

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social media streams such as Twitter are regarded as faster first-hand sources of information generated by massive users. The content diffused through this channel, although noisy, provides important complement and sometimes even a substitute to the traditional news media reporting. In this paper, we propose a novel unsupervised approach based on topic modeling to summarize trending subjects by jointly discovering the representative and complementary information from news and tweets. Our method captures the content that enriches the subject matter by reinforcing the identification of complementary sentence-tweet pairs. To valuate the complementarity of a pair, we leverage topic modeling formalism by combining a two-dimensional topic-aspect model and a cross-collection approach in the multi-document summarization literature. The final summaries are generated by co-ranking the news sentences and tweets in both sides simultaneously. Experiments give promising results as compared to state-of-the-art baselines.