Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition

Authors:
Zaineb Ben Messaoud;Ahmed Ben Hamida
Affiliations:
Technologie de l'information et électronique médicale, ATMS---LETI, ENIS, Sfax University, Sfax, Tunisie;Technologie de l'information et électronique médicale, ATMS---LETI, ENIS, Sfax University, Sfax, Tunisie
Venue:
International Journal of Speech Technology
Year:
2011

Citing 7
Cited 1

Delayed decisions in speech recognition—the case of formants

Pattern Recognition Letters
Experiments in speech recognition using a modular MLP architecture for acoustic modelling

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Spoken language analysis, modeling and recognition-statistical and adaptive connectionist approaches
A fast stochastic parser for determining phrase boundaries for text-to-speech synthesis

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Using multiple acoustic feature sets for speech recognition

Speech Communication
Automatic speech recognition and speech variability: A review

Speech Communication
The general use of tying in phoneme-based HMM speech recognisers

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Advanced classification approach for neuronal phoneme recognition system based on efficient constructive training algorithm

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Combination of multiple acoustic features has great potential to improve Automatic Speech Recognition (ASR) accuracy. Our contribution in this research was to investigate one novel method when using voiced formants' features in combination with standard MFCC features in order to enhance TIMIT phone recognition. These voiced features provide accurate formants frequencies using a Variable Order LPC Coding (VO-LPC) algorithm that was combined with continuity constraints. The overall estimating formants were concatenated with MFCC features when a voiced frame could be detected. For feature-level combination, Heteroscedastic Linear Discriminant Analysis (HLDA) based approach had been used successfully to find an optimal linear combination of successive vectors of a single feature stream.A series of experiments on phone recognition speaker-independent continuous-speech had been carried out using a subset of the large read-speech TIMIT phone corpus. Hidden Markov Model Toolkit (HTK) was also used throughout all carried experiments. Using such feature level combination, optimized mixture splitting and a bigram language model, a detailed analysis on this combination performance was discussed for Context-Independent (CI) and Context-Dependent (CD) Hidden Markov Models (HMM). Experimental results from our proposed procedure showed that phone error rate was successfully decreased by about 3%. At phonetic level group, an increase of 8% and of 10% was observed respectively for vowel and liquid group. These results proved clear phone enhancement regarding existing state of the art.