Examining the consensus between human summaries: initial experiments with factoid analysis

Authors:
Hans van Halteren;Simone Teufel
Affiliations:
University of Nijmegen, The Netherlands;Cambridge University, UK
Venue:
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Year:
2003

Citing 6
Cited 25

Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
The TIPSTER SUMMAC Text Summarization Evaluation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Fast generation of abstracts from general domain text corpora by extracting relevant sentences

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A comparison of rankings produced by summarization evaluation measures

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Manual and automatic evaluation of summaries

AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4

Why is it difficult to find comprehensive information? Implications of information scatter for search and design: Research Articles

Journal of the American Society for Information Science and Technology
Do summaries help?

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An empirical study of information synthesis tasks

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ParaEval: using paraphrases to evaluate summaries automatically

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
An information-theoretic approach to automatic evaluation of summaries

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation

ACM Transactions on Speech and Language Processing (TSLP)
Automatic summarising: The state of the art

Information Processing and Management: an International Journal
Summarization system evaluation revisited: N-gram graphs

ACM Transactions on Speech and Language Processing (TSLP)
On the subjectivity of human-authored summaries*

Natural Language Engineering
Challenges in evaluating summaries of short stories

SumQA '06 Proceedings of the Workshop on Task-Focused Summarization and Question Answering
Query-focused summaries or query-biased summaries?

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Lessons learned from large scale evaluation of systems that produce text: nightmares and pleasant surprises

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Evaluation of automatic summaries: metrics under varying data conditions

UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation
Summarizing short stories

Computational Linguistics
Scatter matters: Regularities and implications for the scatter of healthcare information on the Web

Journal of the American Society for Information Science and Technology
Formal and functional assessment of the pyramid method for summary content evaluation*

Natural Language Engineering
Learning from collective human behavior to introduce diversity in lexical choice

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Information status distinctions and referring expressions: An empirical study of references to people in news summaries

Computational Linguistics
On using a quantum physics formalism for multidocument summarization

Journal of the American Society for Information Science and Technology
Combining summaries using unsupervised rank aggregation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
On macro- and micro-level information in multiple documents and its influence on summarization

International Journal of Information Management: The Journal for Information Professionals
Summary evaluation: together we stand NPowER-ed

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Automatically assessing machine summary content without a gold standard

Computational Linguistics
Generating extractive summaries of scientific paradigms

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new approach to summary evaluation which combines two novel aspects, namely (a) content comparison between gold standard summary and system summary via factoids, a pseudo-semantic representation based on atomic information units which can be robustly marked in text, and (b) use of a gold standard consensus summary, in our case based on 50 individual summaries of one text. Even though future work on more than one source text is imperative, our experiments indicate that (1) ranking with regard to a single gold standard summary is insufficient as rankings based on any two randomly chosen summaries are very dissimilar (correlations average ρ = 0.20), (2) a stable consensus summary can only be expected if a larger number of summaries are collected (in the range of at least 30--40 summaries), and (3) similarity measurement using unigrams shows a similarly low ranking correlation when compared with factoid-based ranking.