Rating computer-generated questions with Mechanical Turk

Authors:
Michael Heilman;Noah A. Smith
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 8
Cited 4

Two Experiments on Learning Probabilistic Dependency Grammars from Corpora

Two Experiments on Learning Probabilistic Dependency Grammars from Corpora
Generation that exploits corpus-based statistical knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
SPoT: a trainable sentence planner

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Good question! Statistical ranking for question generation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
How good is the crowd at "real" WSD?

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Impacts of machine translation and speech synthesis on speech-to-speech translation

Speech Communication
Mind the gap: learning to choose gaps for question generation

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We use Amazon Mechanical Turk to rate computer-generated reading comprehension questions about Wikipedia articles. Such application-specific ratings can be used to train statistical rankers to improve systems' final output, or to evaluate technologies that generate natural language. We discuss the question rating scheme we developed, assess the quality of the ratings that we gathered through Amazon Mechanical Turk, and show evidence that these ratings can be used to improve question generation.