Connected Letter Recognition with a Multi-State Time Delay Neural Network
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Context-Dependent Multiple Distribution Phonetic Modeling with MLPs
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Large-vocabulary speaker-independent continuous speech recognition: the sphinx system
Large-vocabulary speaker-independent continuous speech recognition: the sphinx system
Continuous speech recognition using linked predictive neural networks
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Integrating time alignment and neural networks for high performance continuous speech recognition
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Context-dependent hidden control neutral network architecture for continuous speech recognition
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
An LVQ based reference model for speaker-adaptive speech recognition
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Connectionist word-level classification in speech recognition
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Improving the MS-TDNN for word spotting
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Hi-index | 0.00 |
Connectionist speech recognition systems are often handicapped by an inconsistency between training and testing criteria. This problem is addressed by the Multi-State Time Delay Neural Network (MS-TDNN), a hierarchical phoneme and word classifier which uses DTW to modulate its connectivity pattern, and which is directly trained on word-level targets. The consistent use of word accuracy as a criterion during both training and testing leads to very high system performance, even with limited training data. Until now, the MS-TDNN has been applied primarily to small vocabulary recognition and word spotting tasks. In this paper we apply the architecture to large vocabulary continuous speech recognition, and demonstrate that our MSTDNN outperforms all other systems that have been tested on the CMU Conference Registration database.