Discrete vs. continuous rating scales for language evaluation in NLP

Authors:
Anja Belz;Eric Kow
Affiliations:
University of Brighton, Brighton, UK;University of Brighton, Brighton, UK
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 5
Cited 2

Generating basic skills reports for low-skilled readers*

Natural Language Engineering
System building cost vs. output quality in data-to-text generation

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Building a semantically transparent corpus for the generation of referring expressions

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
The TUNA challenge 2008: overview and evaluation results

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
A simple domain-independent probabilistic approach to generation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

The first surface realisation shared task: overview and evaluation results

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Generating non-projective word order in statistical linearization

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Studies assessing rating scales are very common in psychology and related fields, but are rare in NLP. In this paper we assess discrete and continuous scales used for measuring quality assessments of computer-generated language. We conducted six separate experiments designed to investigate the validity, reliability, stability, interchangeability and sensitivity of discrete vs. continuous scales. We show that continuous scales are viable for use in language evaluation, and offer distinct advantages over discrete scales.