Cheap, fast and good enough: automatic speech recognition with non-expert transcription

Authors:
Scott Novotney;Chris Callison-Burch
Affiliations:
Johns Hopkins University;Johns Hopkins University
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 4
Cited 16

Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Feasibility of human-in-the-loop minimum error rate training

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Fast, cheap, and creative: evaluating translation quality using Amazon's Mechanical Turk

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Shared task: crowdsourced accessibility elicitation of Wikipedia articles

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using Amazon Mechanical Turk for transcription of non-native speech

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using the Amazon Mechanical Turk to transcribe and annotate meeting speech for extractive summarization

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
The importance of visual context clues in multimedia translation

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
An iterative reliability measure for semi-anonymous annotators

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Crowdsourcing research opportunities: lessons from natural language processing

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Crowdsourcing micro-level multimedia annotations: the challenges of evaluation and interface

Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia
Constructing parallel corpora for six Indian languages via crowdsourcing

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Speech for Content Creation

International Journal of Mobile Human Computer Interaction
An introduction to crowdsourcing for language and multimedia technology research

PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Learning from multiple annotators: Distinguishing good from random labelers

Pattern Recognition Letters
Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic

Speech Communication
Toward crowdsourcing micro-level behavior annotations: the challenges of interface, training, and generalization

Proceedings of the 19th international conference on Intelligent User Interfaces
Bucking the trend: improved evaluation and annotation practices for ESL error detection systems

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Deploying an automatic speech recognition system with reasonable performance requires expensive and time-consuming in-domain transcription. Previous work demonstrated that non-professional annotation through Amazon's Mechanical Turk can match professional quality. We use Mechanical Turk to transcribe conversational speech for as little as one thirtieth the cost of professional transcription. The higher disagreement of non-professional transcribers does not have a significant effect on system performance. While previous work demonstrated that redundant transcription can improve data quality, we found that resources are better spent collecting more data. Finally, we describe a quality control method without needing professional transcription.