Using the Amazon Mechanical Turk to transcribe and annotate meeting speech for extractive summarization

Authors:
Matthew Marge;Satanjeev Banerjee;Alexander I. Rudnicky
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Year:
2010

Citing 5
Cited 2

Segmenting meetings into agenda items by extracting implicit supervision from human note-taking

Proceedings of the 12th international conference on Intelligent user interfaces
Correlation between ROUGE and human evaluation of extractive meeting summaries

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A skip-chain conditional random field for ranking meeting utterances by importance

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Detecting the noteworthiness of utterances in human meetings

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Cheap, fast and good enough: automatic speech recognition with non-expert transcription

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to its complexity, meeting speech provides a challenge for both transcription and annotation. While Amazon's Mechanical Turk (MTurk) has been shown to produce good results for some types of speech, its suitability for transcription and annotation of spontaneous speech has not been established. We find that MTurk can be used to produce high-quality transcription and describe two techniques for doing so (voting and corrective). We also show that using a similar approach, high quality annotations useful for summarization systems can also be produced. In both cases, accuracy is comparable to that obtained using trained personnel.