Discovery of topically coherent sentences for extractive summarization

Authors:
Asli Celikyilmaz;Dilek Hakkani-Tür
Affiliations:
Microsoft Speech Labs, Mountain View, CA;Microsoft Speech Labs | Microsoft Research, Mountain View, CA
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 18
Cited 2

WordNet: a lexical database for English

Communications of the ACM
Latent dirichlet allocation

The Journal of Machine Learning Research
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Information fusion in the context of multi-document summarization

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Bayesian query-focused summarization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Topic-focused multi-document summarization using an approximate oracle score

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Mixtures of hierarchical topics with Pachinko allocation

Proceedings of the 24th international conference on Machine learning
Satisfying information needs with multi-document summaries

Information Processing and Management: an International Journal
Bayesian unsupervised topic segmentation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Improved affinity graph based multi-document summarization

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Multi-document summarization using sentence-based topic models

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A hybrid hierarchical model for multi-document summarization

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

DualSum: a topic-model based approach for update summarization

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
The effectiveness of automatic text summarization in mobile learning contexts

Computers & Education

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extractive methods for multi-document summarization are mainly governed by information overlap, coherence, and content constraints. We present an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts, to generate topically coherent and non-redundant summaries. Based on human evaluations our models generate summaries with higher linguistic quality in terms of coherence, readability, and redundancy compared to benchmark systems. Although our system is unsupervised and optimized for topical coherence, we achieve a 44.1 ROUGE on the DUC-07 test set, roughly in the range of state-of-the-art supervised models.