Time and frequency filtering of filter-bank energies for robust HMM speech recognition
Speech Communication - Special issue on noise robust ASR
Broadcast News Transcription Using HTK
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Speaker normalization using efficient frequency warping procedures
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A fast stochastic parser for determining phrase boundaries for text-to-speech synthesis
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Discriminative input stream combination for conditional random field phone recognition
IEEE Transactions on Audio, Speech, and Language Processing
Phoneme and tonal accent recognition for Thai speech
Expert Systems with Applications: An International Journal
International Journal of Speech Technology
International Journal of Speech Technology
Robustness analysis of eleven linear classifiers in extremely high–dimensional feature spaces
ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Integration of multiple acoustic and language models for improved Hindi speech recognition system
International Journal of Speech Technology
Hi-index | 0.00 |
In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA) as well as indirectly on model level using discriminative model combination (DMC). Experimental results are presented for both small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The word error rate is reduced from 1.8% to 1.5% on the SieTill task for German digit string recognition. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.4% on the VerbMobil II corpus, a German large-vocabulary conversational speech task, and from 14.1% to 13.5% on the British English part of the European parliament plenary sessions (EPPS) task from the 2005 TC-STAR ASR evaluation campaign.