Advances in children's speech recognition within an interactive literacy tutor

Authors:
Andreas Hagen;Bryan Pellom;Sarel Van Vuuren;Ronald Cole
Affiliations:
University of Colorado at Boulder;University of Colorado at Boulder;University of Colorado at Boulder;University of Colorado at Boulder
Venue:
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Year:
2004

Citing 2
Cited 8

A prototype reading coach that listens

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Improved methods for vocal tract normalization

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02

Analysis and detection of reading miscues for interactive literacy tutors

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Acoustic variability and automatic recognition of children's speech

Speech Communication
Highly accurate children's speech recognition for interactive reading tutors using subword units

Speech Communication
A review of ASR technologies for children's speech

Proceedings of the 2nd Workshop on Child, Computer and Interaction
Robustness optimization of a speech interface for child-directed embedded language tutoring

Proceedings of the 2nd Workshop on Child, Computer and Interaction
FLORA: Fluent oral reading assessment of children's speech

ACM Transactions on Speech and Language Processing (TSLP)
Exploiting predictable response training to improve automatic recognition of children's spoken responses

ITS'10 Proceedings of the 10th international conference on Intelligent Tutoring Systems - Volume Part I
Recognizing Young Readers' Spoken Questions

International Journal of Artificial Intelligence in Education

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present recent advances in acoustic and language modeling that improve recognition performance when children read out loud within digital books. First we extend previous work by incorporating cross-utterance word history information and dynamic n-gram language modeling. By additionally incorporating Vocal Tract Length Normalization (VTLN), Speaker-Adaptive Training (SAT) and iterative unsupervised structural maximum a posteriori linear regression (SMAPLR) adaptation we demonstrate a 54% reduction in word error rate. Next, we show how data from children's read-aloud sessions can be utilized to improve accuracy in a spontaneous story summarization task. An error reduction of 15% over previous published results is shown. Finally we describe a novel real-time implementation of our research system that incorporates time-adaptive acoustic and language modeling.