A unified framework for automatic evaluation using N-gram co-occurrence statistics

Authors:
Radu Soricut;Eric Brill
Affiliations:
University of Southern California, Marina del Rey, CA;Microsoft Research, Redmond, WA
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 4
Cited 7

Information Retrieval

Information Retrieval
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Kernel-based approach for automatic evaluation of natural language generation technologies: application to automatic summarization

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Combining deep linguistics analysis and surface pattern learning: a hybrid approach to Chinese definitional question answering

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
BLANC: learning evaluation metrics for MT

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Automatically evaluating answers to definition questions

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Supervised automatic evaluation for summarization with voted regression model

Information Processing and Management: an International Journal
Abstractive headline generation using WIDL-expressions

Information Processing and Management: an International Journal
An n-Gram and Initial Description Based Approach for Entity Ranking Track

Focused Access to XML Documents

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a unified framework for automatic evaluation of NLP applications using N-gram co-occurrence statistics. The automatic evaluation metrics proposed to date for Machine Translation and Automatic Summarization are particular instances from the family of metrics we propose. We show that different members of the same family of metrics explain best the variations obtained with human evaluations, according to the application being evaluated (Machine Translation, Automatic Summarization, and Automatic Question Answering) and the evaluation guidelines used by humans for evaluating such applications.