Eliciting natural speech from non-native users: collecting speech data for LVCSR

Authors:
Laura Mayfield Tomokiyo;Susanne Burger
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ASSESSEVALNLP '99 Proceedings of a Symposium on Computer Mediated Language Assessment and Evaluation in Natural Language Processing
Year:
1999

Citing 0
Cited 2

You're not from 'round here, are you?: naive Bayes detection of non-native utterance text

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Not all wizards are from Oz: Iterative design of intelligent learning environments by communication capacity tapering

Computers & Education

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we discuss the design of a database of recorded and transcribed read and spontaneous speech of semi-fluent, strongly-accented non-native speakers of English. While many speech applications work best with a recognizer that expects native-like usage, others could benefit from a speech recognition component that is forgiving of the sorts of errors that are not a barrier to communication; in order to train such a recognizer a database of non-native speech is needed. We examine how collecting data from non-native speakers must necessarily differ from collection from native speakers, and describe work we did to develop an appropriate scenario, recording setup, and optimal surroundings during recording.