Performance through consistency: connectionist large vocabulary continuous speech recognition

Authors:
Joe Tebelskis
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburg, PA
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Year:
1993

Citing 8
Cited 1

Connected Letter Recognition with a Multi-State Time Delay Neural Network

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Context-Dependent Multiple Distribution Phonetic Modeling with MLPs

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Large-vocabulary speaker-independent continuous speech recognition: the sphinx system

Large-vocabulary speaker-independent continuous speech recognition: the sphinx system
Continuous speech recognition using linked predictive neural networks

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Integrating time alignment and neural networks for high performance continuous speech recognition

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Context-dependent hidden control neutral network architecture for continuous speech recognition

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
An LVQ based reference model for speaker-adaptive speech recognition

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Connectionist word-level classification in speech recognition

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Improving the MS-TDNN for word spotting

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Connectionist speech recognition systems are often handicapped by an inconsistency between training and testing criteria. This problem is addressed by the Multi-State Time Delay Neural Network (MS-TDNN), a hierarchical phoneme and word classifier which uses DTW to modulate its connectivity pattern, and which is directly trained on word-level targets. The consistent use of word accuracy as a criterion during both training and testing leads to very high system performance, even with limited training data. Until now, the MS-TDNN has been applied primarily to small vocabulary recognition and word spotting tasks. In this paper we apply the architecture to large vocabulary continuous speech recognition, and demonstrate that our MSTDNN outperforms all other systems that have been tested on the CMU Conference Registration database.