Automatically evaluating content selection in summarization without human models

Authors:
Annie Louis;Ani Nenkova
Affiliations:
University of Pennsylvania;University of Pennsylvania
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Year:
2009

Citing 11
Cited 10

The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Evaluation challenges in large-scale document summarization

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Novelty detection: the TREC experience

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
An information-theoretic approach to automatic evaluation of summaries

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
Topic-focused multi-document summarization using an approximate oracle score

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
DUC in context

Information Processing and Management: an International Journal
A comparison of rankings produced by summarization evaluation measures

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Mind the gap: dangers of divorcing evaluations of summary content from linguistic quality

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic evaluation of text coherence: models and representations

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

CONE: metrics for automatic evaluation of named entity co-reference resolution

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Learning summary content units with topic modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Multilingual summarization evaluation without human models

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Summarization of personal photologs using multidimensional content and context

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Automatic summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
Summarize what you are interested in: an optimization framework for interactive personalized summarization

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
GEMS: generative modeling for evaluation of summaries

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Summarizing speech by contextual reinforcement of important passages

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Summary evaluation: together we stand NPowER-ed

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Automatically assessing machine summary content without a gold standard

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a fully automatic method for content selection evaluation in summarization that does not require the creation of human model summaries. Our work capitalizes on the assumption that the distribution of words in the input and an informative summary of that input should be similar to each other. Results on a large scale evaluation from the Text Analysis Conference show that input-summary comparisons are very effective for the evaluation of content selection. Our automatic methods rank participating systems similarly to manual model-based pyramid evaluation and to manual human judgments of responsiveness. The best feature, Jensen-Shannon divergence, leads to a correlation as high as 0.88 with manual pyramid and 0.73 with responsiveness evaluations.