Integrating time alignment and neural networks for high performance continuous speech recognition

Authors:
P. Haffner;M. Franzini;A. Waibel
Affiliations:
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA;Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA;Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Venue:
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Year:
1991

Citing 0
Cited 9

Activated automotive control of continuous voice using sequential feature segmentation

Proceedings of the International Conference and Workshop on Emerging Trends in Technology
Prototype-based discriminative training for various speech units

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Rapid connectionist speaker adaptation

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
A speech recognizer optimally combining learning vector quantization, dynamic programming and multi-layer perceptron

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Connectionist word-level classification in speech recognition

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Connectionist architectural learning for high performance character and speech recognition

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
Multi-speaker/speaker-independent architectures for the multi-state time delay neural network

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Performance through consistency: connectionist large vocabulary continuous speech recognition

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Prototype-based MCE/GPD training for word spotting and connected word recognition

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The authors describe two systems in which neural network classifiers are merged with dynamic programming (DP) time alignment methods to produce high-performance continuous speech recognizers. One system uses the connectionist Viterbi-training (CVT) procedure, in which a neural network with frame-level outputs is trained using guidance from a time alignment procedure. The other system uses multi-state time-delay neural networks (MS-TDNNs), in which embedded DP time alignment allows network training with only word-level external supervision. The CVT results on the, TI Digits are 99.1% word accuracy and 98.0% string accuracy. The MS-TDNNs are described in detail, with attention focused on their architecture, the training procedure, and results of applying the MS-TDNNs to continuous speaker-dependent alphabet recognition: on two speakers, word accuracy is respectively 97.5% and 89.7%.