Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
The TIPSTER SUMMAC Text Summarization Evaluation
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Fast generation of abstracts from general domain text corpora by extracting relevant sentences
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A comparison of rankings produced by summarization evaluation measures
NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4
Manual and automatic evaluation of summaries
AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Journal of the American Society for Information Science and Technology
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An empirical study of information synthesis tasks
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ParaEval: using paraphrases to evaluate summaries automatically
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
An information-theoretic approach to automatic evaluation of summaries
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Pyramid Method: Incorporating human content selection variation in summarization evaluation
ACM Transactions on Speech and Language Processing (TSLP)
Automatic summarising: The state of the art
Information Processing and Management: an International Journal
Summarization system evaluation revisited: N-gram graphs
ACM Transactions on Speech and Language Processing (TSLP)
On the subjectivity of human-authored summaries*
Natural Language Engineering
Challenges in evaluating summaries of short stories
SumQA '06 Proceedings of the Workshop on Task-Focused Summarization and Question Answering
Query-focused summaries or query-biased summaries?
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Evaluation of automatic summaries: metrics under varying data conditions
UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation
Computational Linguistics
Scatter matters: Regularities and implications for the scatter of healthcare information on the Web
Journal of the American Society for Information Science and Technology
Formal and functional assessment of the pyramid method for summary content evaluation*
Natural Language Engineering
Learning from collective human behavior to introduce diversity in lexical choice
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
On using a quantum physics formalism for multidocument summarization
Journal of the American Society for Information Science and Technology
Combining summaries using unsupervised rank aggregation
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
On macro- and micro-level information in multiple documents and its influence on summarization
International Journal of Information Management: The Journal for Information Professionals
Summary evaluation: together we stand NPowER-ed
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Automatically assessing machine summary content without a gold standard
Computational Linguistics
Generating extractive summaries of scientific paradigms
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We present a new approach to summary evaluation which combines two novel aspects, namely (a) content comparison between gold standard summary and system summary via factoids, a pseudo-semantic representation based on atomic information units which can be robustly marked in text, and (b) use of a gold standard consensus summary, in our case based on 50 individual summaries of one text. Even though future work on more than one source text is imperative, our experiments indicate that (1) ranking with regard to a single gold standard summary is insufficient as rankings based on any two randomly chosen summaries are very dissimilar (correlations average ρ = 0.20), (2) a stable consensus summary can only be expected if a larger number of summaries are collected (in the range of at least 30--40 summaries), and (3) similarity measurement using unigrams shows a similarly low ranking correlation when compared with factoid-based ranking.