Delayed decisions in speech recognition—the case of formants
Pattern Recognition Letters
Experiments in speech recognition using a modular MLP architecture for acoustic modelling
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Spoken language analysis, modeling and recognition-statistical and adaptive connectionist approaches
A fast stochastic parser for determining phrase boundaries for text-to-speech synthesis
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Maximum likelihood discriminant feature spaces
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Using multiple acoustic feature sets for speech recognition
Speech Communication
Automatic speech recognition and speech variability: A review
Speech Communication
The general use of tying in phoneme-based HMM speech recognisers
ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
International Journal of Speech Technology
Hi-index | 0.00 |
Combination of multiple acoustic features has great potential to improve Automatic Speech Recognition (ASR) accuracy. Our contribution in this research was to investigate one novel method when using voiced formants' features in combination with standard MFCC features in order to enhance TIMIT phone recognition. These voiced features provide accurate formants frequencies using a Variable Order LPC Coding (VO-LPC) algorithm that was combined with continuity constraints. The overall estimating formants were concatenated with MFCC features when a voiced frame could be detected. For feature-level combination, Heteroscedastic Linear Discriminant Analysis (HLDA) based approach had been used successfully to find an optimal linear combination of successive vectors of a single feature stream.A series of experiments on phone recognition speaker-independent continuous-speech had been carried out using a subset of the large read-speech TIMIT phone corpus. Hidden Markov Model Toolkit (HTK) was also used throughout all carried experiments. Using such feature level combination, optimized mixture splitting and a bigram language model, a detailed analysis on this combination performance was discussed for Context-Independent (CI) and Context-Dependent (CD) Hidden Markov Models (HMM). Experimental results from our proposed procedure showed that phone error rate was successfully decreased by about 3%. At phonetic level group, an increase of 8% and of 10% was observed respectively for vowel and liquid group. These results proved clear phone enhancement regarding existing state of the art.