Data-driven spectral basis functions for automatic speech recognition

Authors:
Naren Malayath;Hynek Hermansky
Affiliations:
Qualcomm Inc., AA-318V, 5775 Morehouse Drive, San Diego, CA;OGI School of Science and Technology, Oregon Health and Science University, Portland, OR and International Computer Science Institute, Berkeley, CA
Venue:
Speech Communication
Year:
2003

Citing 12
Cited 1

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Digital representations of speech signals

Readings in speech recognition
The use of speech knowledge in automatic speech recognition

Readings in speech recognition
Recognition of speaker-dependent continuous speech with KEAL

Readings in speech recognition
Fundamentals of speech recognition

Fundamentals of speech recognition
Statistical methods for speech recognition

Statistical methods for speech recognition
Should recognizers have ears?

Speech Communication - Special issue on robust speech recognition
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Frequency-Warping and Speaker-Normalization

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
The acoustic-modeling problem in automatic speech recognition

The acoustic-modeling problem in automatic speech recognition
Data-driven methods for extracting features from speech

Data-driven methods for extracting features from speech
Relevancy of time-frequency features for phonetic classification measured by mutual information

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

From features extraction to strong security in mobile environment: a new hybrid system

OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature extraction plays a major role in any form of pattern recognition. Current feature extraction methods used for automatic speech recognition (ASR) and speaker verification rely mainly on properties of speech production (modeled by all-pole filters) and perception (critical-band integration simulated by Mel/Bark filter bank). We propose to use stochastic methods for designing feature extraction methods which are trained to alleviate the unwanted variability present in speech signals. In this paper we show that such data-driven methods provide significant advantages over the conventional methods both in terms of performance of ASR and in providing understanding about the nature of speech signal. The first part of the paper investigates the suitability of the cepstral features obtained by applying discrete cosine transform on logarithmic critical-band power spectra. An alternate set of basis functions were designed by linear discriminant analysis (LDA) of logarithmic critical-band power spectra. Discriminant features extracted by these alternate basis functions are shown to outperform the cepstral features in ASR experiments. The second part of the paper discusses the relevance of non-uniform frequency resolution used by current speech analysis methods like Mel frequency analysis and perceptual linear predictive analysis. It is shown that LDA of the short-time Fourier spectrum of speech yields spectral basis functions which provide comparatively lower resolution to the high-Frequency region of spectrum. This is consistent with critical-band resolution and is shown to be caused by the spectral properties of vowel sounds.