Feature extraction using non-linear transformation for robust speech recognition on the Aurora database

Authors:
S. Sharma;D. Ellis;S. Kajarekar;P. Jain;H. Hermansky
Affiliations:
Oregon Graduate Inst. of Sci. & Technol., Portland, OR, USA;-;-;-;-
Venue:
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Year:
2000

Citing 0
Cited 5

Reduced feature-set based parallel CHMM speech recognition systems

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Spoken language analysis, modeling and recognition-statistical and adaptive connectionist approaches
Speech encoding in a model of peripheral auditory processing: Quantitative assessment by means of automatic speech recognition

Speech Communication
Enhancing Speaker Discrimination at the Feature Level

Speaker Classification I
Audio-visual speaker identification using dynamic facial movements and utterance phonetic content

Applied Soft Computing
MLP internal representation as discriminative features for improved speaker recognition

NOLISP'05 Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We evaluate the performance of several feature sets on the Aurora task as defined by ETSI. We show that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system. The non-linear transformation is computed using a neural network which is discriminatively trained on the phonetically labeled (forcibly aligned) training data. A combination of the non-linearly transformed PLP (perceptive linear predictive coefficients), MSG (modulation filtered spectrogram) and TRAP (temporal pattern) features yields a 63% improvement in error rate as compared to baseline me frequency cepstral coefficients features. The use of the non-linearly transformed RASTA-like features, with system parameters scaled down to take into account the ETSI imposed memory and latency constraints, still yields a 40% improvement in error rate.