Quantifying the limits and success of extractive summarization systems across domains

Authors:
Hakan Ceylan;Rada Mihalcea;Umut Özertem;Elena Lloret;Manuel Palomar
Affiliations:
University of North Texas, Denton, TX;University of North Texas, Denton, TX;Yahoo! Labs, Sunnyvale, CA;University of Alicante, Alicante, Spain;University of Alicante, Alicante, Spain
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 11
Cited 4

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Training a selection function for extraction

Proceedings of the eighth international conference on Information and knowledge management
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Using hidden Markov modeling to decompose human-written summaries

Computational Linguistics - Summarization
A statistical model for domain-independent text segmentation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
An algorithm for one-page summarization of a long text based on thematic hierarchy detection

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The potential and limitations of automatic sentence extraction for summarization

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
The Decomposition of Human-Written Book Summaries

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A comparison of rankings produced by summarization evaluation measures

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research

Revisiting centrality-as-relevance: support sets and similarity as geometric proximity

Journal of Artificial Intelligence Research
Towards automatic generation of catchphrases for legal case reports

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
A new minimally-supervised framework for domain word sense disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Self reinforcement for important passage retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper analyzes the topic identification stage of single-document automatic text summarization across four different domains, consisting of newswire, literary, scientific and legal documents. We present a study that explores the summary space of each domain via an exhaustive search strategy, and finds the probability density function (pdf) of the ROUGE score distributions for each domain. We then use this pdf to calculate the percentile rank of extractive summarization systems. Our results introduce a new way to judge the success of automatic summarization systems and bring quantified explanations to questions such as why it was so hard for the systems to date to have a statistically significant improvement over the lead baseline in the news domain.