The Pyramid Method: Incorporating human content selection variation in summarization evaluation

Authors:
Ani Nenkova;Rebecca Passonneau;Kathleen McKeown
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2007

Citing 9
Cited 49

Automatic text structuring and summarization

Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Theory and Practice of Discourse Parsing and Summarization

The Theory and Practice of Discourse Parsing and Summarization
Summarization evaluation using relative utility

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sentence level discourse parsing using syntactic and lexical information

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
A comparison of rankings produced by summarization evaluation measures

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Examining the consensus between human summaries: initial experiments with factoid analysis

HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5

Time-Compressing Speech: ASR Transcripts Are an Effective Way to Support Gist Extraction

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
On the subjectivity of human-authored summaries*

Natural Language Engineering
Directions for exploiting asymmetries in multilingual Wikipedia

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Sentence position revisited: a robust light-weight update summarization 'baseline' algorithm

CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Trainable speaker-based referring expression generation

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Extrinsic summarization evaluation: A decision audit task

ACM Transactions on Speech and Language Processing (TSLP)
ParaMetric: an automatic evaluation metric for paraphrasing

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A framework for identifying textual redundancy

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
An Effectiveness Measure for Ambiguous and Underspecified Queries

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Individual and domain adaptation in sentence planning for dialogue

Journal of Artificial Intelligence Research
Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation

Journal of Biomedical Informatics
Query-focused summaries or query-biased summaries?

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Summarizing definition from Wikipedia

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Active learning of extractive reference summaries for lecture speech summarization

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Automatically evaluating content selection in summarization without human models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Optimization-based content selection for opinion summarization

UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation
Extractive vs. NLG-based abstractive summarization of evaluative text: the effect of corpus controversiality

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Contrasting the interaction structure of an email and a telephone corpus: a machine learning approach to annotation of dialogue function units

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
wikiBABEL: community creation of multilingual data

WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Formal and functional assessment of the pyramid method for summary content evaluation*

Natural Language Engineering
Cross-language document summarization based on machine translation quality prediction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Recognising entailment within discourse

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Evaluation of a sentence ranker for text summarization based on Roget's thesaurus

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Using graded-relevance metrics for evaluating community QA answer selection

Proceedings of the fourth ACM international conference on Web search and data mining
Learning summary content units with topic modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A new approach for multi-document update summarization

Journal of Computer Science and Technology
Automatic summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011
Using bilingual information for cross-language document summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Search snippet evaluation at yandex: lessons learned and future directions

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
A survey on question answering technology from an information retrieval perspective

Information Sciences: an International Journal
Click the search button and be happy: evaluating direct and immediate information access

Proceedings of the 20th ACM international conference on Information and knowledge management
A novel approach to update summarization using evolutionary manifold-ranking and spectral clustering

Expert Systems with Applications: An International Journal
Active learning with semi-automatic annotation for extractive speech summarization

ACM Transactions on Speech and Language Processing (TSLP)
Towards a unified approach for opinion question answering and summarization

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Text specificity and impact on quality of news summaries

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Framework for abstractive summarization using text-to-text generation

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Aggregation of multiple judgments for evaluating ordered lists

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Ranking human and machine summarization systems

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
GEMS: generative modeling for evaluation of summaries

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Revisiting centrality-as-relevance: support sets and similarity as geometric proximity

Journal of Artificial Intelligence Research
Using graph based mapping of co-occurring words and closeness centrality score for summarization evaluation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Combining summaries using unsupervised rank aggregation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Comparative document summarization via discriminative sentence selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Fully abstractive approach to guided summarization

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
An assessment of the accuracy of automatic evaluation in summarization

Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-Nurse

Artificial Intelligence in Medicine
Comparative Document Summarization via Discriminative Sentence Selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Automatically assessing machine summary content without a gold standard

Computational Linguistics
Summaries, ranked retrieval and sessions: a unified framework for information access evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human variation in content selection in summarization has given rise to some fundamental research questions: How can one incorporate the observed variation in suitable evaluation measures? How can such measures reflect the fact that summaries conveying different content can be equally good and informative? In this article, we address these very questions by proposing a method for analysis of multiple human abstracts into semantic content units. Such analysis allows us not only to quantify human variation in content selection, but also to assign empirical importance weight to different content units. It serves as the basis for an evaluation method, the Pyramid Method, that incorporates the observed variation and is predictive of different equally informative summaries. We discuss the reliability of content unit annotation, the properties of Pyramid scores, and their correlation with other evaluation methods.