A hybrid hierarchical model for multi-document summarization

Authors:
Asli Celikyilmaz;Dilek Hakkani-Tur
Affiliations:
University of California, Berkeley;International Computer Science Institute, Berkeley, CA
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 12
Cited 11

Making large-scale support vector machine learning practical

Advances in kernel methods
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Latent dirichlet allocation

The Journal of Machine Learning Research
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
Text summarization using a trainable summarizer and latent semantic analysis

Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Bayesian query-focused summarization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Topic-focused multi-document summarization using an approximate oracle score

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning document-level semantic properties from free-text annotations

Journal of Artificial Intelligence Research
Document summarization using conditional random fields

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Journal of the ACM (JACM)

Discovery of topically coherent sentences for extractive summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A class of submodular functions for document summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Using bilingual information for cross-language document summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Query snowball: a co-occurrence-based approach to multi-document summarization for question answering

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Summarizing the differences in multilingual news

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Extractive multi-document summaries should explicitly not contain document-specific content

WASDGML '11 Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
pSum-SaDE: a modified p-median problem and self-adaptive differential evolution algorithm for text summarization

Applied Computational Intelligence and Soft Computing
Unsupervised topic modeling approaches to decision summarization in spoken meetings

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
DESAMC+DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization

Knowledge-Based Systems
Combining co-clustering with noise detection for theme-based summarization

ACM Transactions on Speech and Language Processing (TSLP)
An unsupervised cascade learning scheme for 'cluster-theme keywords' structure extraction from scientific papers

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ~7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.