Hidden Markov models, maximum mutual information estimation, and the speech recognition problem
Hidden Markov models, maximum mutual information estimation, and the speech recognition problem
Improvements in connected digit recognition using higher order spectral and energy features
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
High performance connected digit recognition using maximum mutual information estimation
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Improvements in connected digit recognition using linear discriminant analysis and mixture densities
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Inter-word coarticulation modeling and MMIE training for improved connected digit recognition
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
An algorithm for the dynamic inference of hidden Markov models (DIHMM)
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Hi-index | 0.00 |
This paper describes the latest developments by the speech research group at CRIM in speaker independent connected digit recognition, using Hidden Markov Models (HMMs) trained with Maximum Mutual Information Estimation (MMIE). The work presented here is a continuation of previous work described in [1]. The main differences are: 1) use of the 20 kHz TI/NIST corpus available on CD-ROM (instead of the 10 kHz distribution tape), 2) use of word models (instead of sub-word units), 3) addition of second derivative parameters, 4) a more elaborate training procedure for codebook exponents. The experiments described in this paper were all performed on the complete adult portion of the corpus. Our baseline system, using discrete HMMs and MMIE, has a 0.67% word error rate and a 2.03% string error rate. The paper describes techniques that allowed us to improve greatly the recognition rate. New results include a 0.41% word error rate and 1.25% string error rate with two models per digit (one for male and one for female speakers) using discrete HMMs.