Highly accurate children's speech recognition for interactive reading tutors using subword units

Authors:
Andreas Hagen;Bryan Pellom;Ronald Cole
Affiliations:
Center for Spoken Language Research, University of Colorado at Boulder, 1777 Exposition Drive, Suite #171, Boulder, CO 80301, USA;Center for Spoken Language Research, University of Colorado at Boulder, 1777 Exposition Drive, Suite #171, Boulder, CO 80301, USA;Center for Spoken Language Research, University of Colorado at Boulder, 1777 Exposition Drive, Suite #171, Boulder, CO 80301, USA
Venue:
Speech Communication
Year:
2007

Citing 6
Cited 12

A prototype reading coach that listens

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Modelling out-of-vocabulary words for robust speech recognition

Modelling out-of-vocabulary words for robust speech recognition
Unsupervised discovery of morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Analysis and detection of reading miscues for interactive literacy tutors

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improved methods for vocal tract normalization

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Advances in children's speech recognition within an interactive literacy tutor

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers

Forced-Alignment and Edit-Distance Scoring for Vocabulary Tutoring Applications

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
An overview of spoken language technology for education

Speech Communication
Developing a reading tutor: Design and evaluation of dedicated speech recognition and synthesis modules

Speech Communication
Assessment of emerging reading skills in young native speakers and language learners

Speech Communication
Automatic scoring of children's read-aloud text passages and word lists

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
A Reference Verification Framework and its Application to a Children's Speech Reading Tracker

Proceedings of the 2nd Workshop on Child, Computer and Interaction
On the impact of children's emotional speech on acoustic and language models

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Exploring the effect of differences in the acoustic correlates of adults' and children's speech in the context of automatic speech recognition

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario

ACM Transactions on Speech and Language Processing (TSLP)
Two methods for assessing oral reading prosody

ACM Transactions on Speech and Language Processing (TSLP)
Automatically assessing the ABCs: Verification of children's spoken letter-names and letter-sounds

ACM Transactions on Speech and Language Processing (TSLP)
FLORA: Fluent oral reading assessment of children's speech

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech technology offers great promise in the field of automated literacy and reading tutors for children. In such applications speech recognition can be used to track the reading position of the child, detect oral reading miscues, assessing comprehension of the text being read by estimating if the prosodic structure of the speech is appropriate to the discourse structure of the story, or by engaging the child in interactive dialogs to assess and train comprehension. Despite such promises, speech recognition systems exhibit higher error rates for children due to variabilities in vocal tract length, formant frequency, pronunciation, and grammar. In the context of recognizing speech while children are reading out loud, these problems are compounded by speech production behaviors affected by difficulties in recognizing printed words that cause pauses, repeated syllables and other phenomena. To overcome these challenges, we present advances in speech recognition that improve accuracy and modeling capability in the context of an interactive literacy tutor for children. Specifically, this paper focuses on a novel set of speech recognition techniques which can be applied to improve oral reading recognition. First, we demonstrate that speech recognition error rates for interactive read aloud can be reduced by more than 50% through a combination of advances in both statistical language and acoustic modeling. Next, we propose extending our baseline system by introducing a novel token-passing search architecture targeting subword unit based speech recognition. The proposed subword unit based speech recognition framework is shown to provide equivalent accuracy to a whole-word based speech recognizer while enabling detection of oral reading events and finer grained speech analysis during recognition. The efficacy of the approach is demonstrated using data collected from children in grades 3-5, namely 34.6% of partial words with reasonable evidence in the speech signal are detected at a low false alarm rate of 0.5%.