Significance of joint features derived from the modified group delay function in speech processing
EURASIP Journal on Audio, Speech, and Music Processing
Linear discriminant analysis for improved large vocabulary continuous speech recognition
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Speech recognition in noise using a projection-based likelihood measure for mixture density HMM's
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Spectral contrast normalization and other techniques for speech recognition in noise
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Acoustic modelling of subword units in the Isadora speech recognizer
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Exploiting prediction error in a predictive-based connectionist speech recognition system
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
A comparison of composite features under degraded speech in speaker recognition
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Large vocabulary continuous speech recognition of wall street journal data
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Statistical modelling in continuous speech recognition (CSR)
UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
Two acoustic representations, integrated Mel-scale representation with LDA (IMELDA) and perceptual linear prediction-root power sums (PLP-RPS), both of which have given good results in speech recognition tests, are explored. IMELDA is examined in the context of some related representations. Results of speaker-dependent and independent tests with digits and the alphabet suggest that the optimum PLP order is high and that the effectiveness of PLP-RPS stems not from its modeling of perceptual properties but from its approximation to a desirable statistical property attained exactly by IMELDA. A combined PLP-IMELDA representation is found to be generally more effective than PLP-RPS, but an IMELDA representation derived directly from a filter-bank provides similar results to PLP-IMELDA at a lower computational cost.