Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency

Authors:
Hui Yin;Volker Hohmann;Climent Nadeu
Affiliations:
TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain and Department of Electronic Engineering, Beijing Institute of Technology, Beijing 100081, China;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain and Medizinische Physik, Universität Oldenburg, Germany;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
Speech Communication
Year:
2011

Citing 7
Cited 0

Combining speech enhancement and auditory feature extraction for robust speech recognition

Speech Communication - Special issue on noise robust ASR
Short-time phase spectrum in speech processing: A review and some experimental results

Digital Signal Processing
Effects of instantaneous multiband dynamic compression on speech intelligibility

EURASIP Journal on Applied Signal Processing
Speech encoding in a model of peripheral auditory processing: Quantitative assessment by means of automatic speech recognition

Speech Communication
Perceptual features for automatic speech recognition in noisy environments

Speech Communication
Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

IEICE - Transactions on Information and Systems
Distributed speech recognition of mandarin digits string

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the features used by modern automatic speech recognition systems, such as mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP) coefficients, represent spectral envelope of the speech signal only. Nevertheless, phase or frequency modulation as represented in recent perceptual models of the peripheral auditory system might also contribute to speech decoding. Furthermore, such features can be complementary to the envelope features. This paper proposes a variety of features based on a linear auditory filterbank, the Gammatone filterbank. Envelope features are derived from the envelope of the subband filter outputs. Phase/frequency modulation is represented by the subband instantaneous frequency (IF) and is used explicitly by concatenating envelope-based and IF-based features or is used implicitly by IF-based frequency reassignment. Speech recognition experiments using a standard HMM-based recognizer under both clean training and multi-condition training are conducted on a Chinese mandarin digits corpus. The experimental results show that the proposed envelope and phase based features can improve recognition rates in clean and noisy conditions compared to the reference MFCC-based recognizer.