Performance through consistency: connectionist large vocabulary continuous speech recognition

  • Authors:
  • Joe Tebelskis

  • Affiliations:
  • School of Computer Science, Carnegie Mellon University, Pittsburg, PA

  • Venue:
  • ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

Connectionist speech recognition systems are often handicapped by an inconsistency between training and testing criteria. This problem is addressed by the Multi-State Time Delay Neural Network (MS-TDNN), a hierarchical phoneme and word classifier which uses DTW to modulate its connectivity pattern, and which is directly trained on word-level targets. The consistent use of word accuracy as a criterion during both training and testing leads to very high system performance, even with limited training data. Until now, the MS-TDNN has been applied primarily to small vocabulary recognition and word spotting tasks. In this paper we apply the architecture to large vocabulary continuous speech recognition, and demonstrate that our MSTDNN outperforms all other systems that have been tested on the CMU Conference Registration database.