Learning summary content units with topic modeling

Authors:
Leonhard Hennig;Ernesto William De Luca;Sahin Albayrak
Affiliations:
Technische Universität Berlin;Technische Universität Berlin;Technische Universität Berlin
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 10
Cited 1

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
Studying the history of ideas using topic models

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multi-document summarization using sentence-based topic models

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Automatically evaluating content selection in summarization without human models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence

ImpactWheel: Visual Analysis of the Impact of Online News

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the field of multi-document summarization, the Pyramid method has become an important approach for evaluating machine-generated summaries. The method is based on the manual annotation of text spans with the same meaning in a set of human model summaries. In this paper, we present an unsupervised, probabilistic topic modeling approach for automatically identifying such semantically similar text spans. Our approach reveals some of the structure of model summaries and identifies topics that are good approximations of the Summary Content Units (SCU) used in the Pyramid method. Our results show that the topic model identifies topic-sentence associations that correspond to the contributors of SCUs, suggesting that the topic modeling approach can generate a viable set of candidate SCUs for facilitating the creation of Pyramids.