Phase AutoCorrelation (PAC) features for noise robust speech recognition

Authors:
Shajith Ikbal;Hemant Misra;Hynek Hermansky;Mathew Magimai-Doss
Affiliations:
IBM Research, Bangalore, India;Philips Research, Bangalore, India;John Hopkins University, Baltimore, MD, USA;Idiap Research Institute, Martigny, Switzerland
Venue:
Speech Communication
Year:
2012

Citing 7
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Connectionist Speech Recognition: A Hybrid Approach

Connectionist Speech Recognition: A Hybrid Approach
Subband-Based Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Multistream approach to robust speech recognition

Multistream approach to robust speech recognition
Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise

Speech Communication
Non-linear spectral subtraction (NSS) and hidden Markov models for robust speech recognition in car noise environments

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Root homomorphic deconvolution schemes for speech processing in car noise environments

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a new class of noise robust features derived from an alternative measure of autocorrelation representing the phase variation of speech signal frame over time. These features, referred to as Phase AutoCorrelation (PAC) features include PAC-spectrum and PAC-MFCC, among others. In traditional autocorrelation, correlation between two time delayed signal vectors is computed as their dot product. Whereas in PAC, angle between the vectors in the signal vector space is used to compute the correlation. PAC features are more noise robust because the angle is typically less affected by noise than the dot product. However, the use of angle as correlation estimate makes the PAC features inferior in clean speech. In this paper, we circumvent this problem by introducing another set of features where complementary information among the PAC features and the traditional features are combined adaptively to retain the best of both. An entropy based feature combination method in a multi-layer perceptron (MLP) based multi-stream framework is used to derive an adaptively combined representation of the component feature streams. An evaluation of the combined features using OGI Numbers95 database and Aurora-2 database under various noise conditions and noise levels show significant improvements in recognition accuracies in clean as well as noisy conditions.