Highly accurate children's speech recognition for interactive reading tutors using subword units

  • Authors:
  • Andreas Hagen;Bryan Pellom;Ronald Cole

  • Affiliations:
  • Center for Spoken Language Research, University of Colorado at Boulder, 1777 Exposition Drive, Suite #171, Boulder, CO 80301, USA;Center for Spoken Language Research, University of Colorado at Boulder, 1777 Exposition Drive, Suite #171, Boulder, CO 80301, USA;Center for Spoken Language Research, University of Colorado at Boulder, 1777 Exposition Drive, Suite #171, Boulder, CO 80301, USA

  • Venue:
  • Speech Communication
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speech technology offers great promise in the field of automated literacy and reading tutors for children. In such applications speech recognition can be used to track the reading position of the child, detect oral reading miscues, assessing comprehension of the text being read by estimating if the prosodic structure of the speech is appropriate to the discourse structure of the story, or by engaging the child in interactive dialogs to assess and train comprehension. Despite such promises, speech recognition systems exhibit higher error rates for children due to variabilities in vocal tract length, formant frequency, pronunciation, and grammar. In the context of recognizing speech while children are reading out loud, these problems are compounded by speech production behaviors affected by difficulties in recognizing printed words that cause pauses, repeated syllables and other phenomena. To overcome these challenges, we present advances in speech recognition that improve accuracy and modeling capability in the context of an interactive literacy tutor for children. Specifically, this paper focuses on a novel set of speech recognition techniques which can be applied to improve oral reading recognition. First, we demonstrate that speech recognition error rates for interactive read aloud can be reduced by more than 50% through a combination of advances in both statistical language and acoustic modeling. Next, we propose extending our baseline system by introducing a novel token-passing search architecture targeting subword unit based speech recognition. The proposed subword unit based speech recognition framework is shown to provide equivalent accuracy to a whole-word based speech recognizer while enabling detection of oral reading events and finer grained speech analysis during recognition. The efficacy of the approach is demonstrated using data collected from children in grades 3-5, namely 34.6% of partial words with reasonable evidence in the speech signal are detected at a low false alarm rate of 0.5%.