The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Generic text summarization using relevance measure and latent semantic analysis
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Ranking retrieval systems without relevance judgments
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text summarization via hidden Markov models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Summarization evaluation using relative utility
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The automated acquisition of topic signatures for text summarization
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Centroid-based summarization of multiple documents
Information Processing and Management: an International Journal
Automatic evaluation of summaries using N-gram co-occurrence statistics
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Examining the consensus between human summaries: initial experiments with factoid analysis
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An information-theoretic approach to automatic evaluation of summaries
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation
ACM Transactions on Speech and Language Processing (TSLP)
Topic-focused multi-document summarization using an approximate oracle score
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A comparison of rankings produced by summarization evaluation measures
NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
Performance confidence estimation for automatic summarization
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A scalable global model for summarization
ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Lattice Minimum Bayes-Risk decoding for statistical machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploring content models for multi-document summarization
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using paraphrases for parameter tuning in statistical machine translation
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
The role of pseudo references in MT evaluation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Automatically evaluating content selection in summarization without human models
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Evaluation of automatic summaries: metrics under varying data conditions
UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation
A study of global inference algorithms in multi-document summarization
ECIR'07 Proceedings of the 29th European conference on IR research
Significance tests of automatic machine translation evaluation metrics
Machine Translation
Non-expert evaluation of summarization systems is risky
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Multilingual summarization evaluation without human models
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Findings of the 2011 Workshop on Statistical Machine Translation
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Hi-index | 0.00 |
The most widely adopted approaches for evaluation of summary content follow some protocol for comparing a summary with gold-standard human summaries, which are traditionally called model summaries. This evaluation paradigm falls short when human summaries are not available and becomes less accurate when only a single model is available. We propose three novel evaluation techniques. Two of them are model-free and do not rely on a gold standard for the assessment. The third technique improves standard automatic evaluations by expanding the set of available model summaries with chosen system summaries. We show that quantifying the similarity between the source text and its summary with appropriately chosen measures produces summary scores which replicate human assessments accurately. We also explore ways of increasing evaluation quality when only one human model summary is available as a gold standard. We introduce pseudomodels, which are system summaries deemed to contain good content according to automatic evaluation. Combining the pseudomodels with the single human model to form the gold-standard leads to higher correlations with human judgments compared to using only the one available model. Finally, we explore the feasibility of another measure-similarity between a system summary and the pool of all other system summaries for the same input. This method of comparison with the consensus of systems produces impressively accurate rankings of system summaries, achieving correlation with human rankings above 0.9.