Significance of joint features derived from the modified group delay function in speech processing

Authors:
Rajesh M. Hegde;Hema A. Murthy;V. R. R. Gadde
Affiliations:
Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA;Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India;STAR Lab, SRI International, Menlo Park, CA
Venue:
EURASIP Journal on Audio, Speech, and Music Processing
Year:
2007

Citing 10
Cited 1

Discrete cosine transform: algorithms, advantages, applications

Discrete cosine transform: algorithms, advantages, applications
Acoustical and environmental robustness in automatic speech recognition

Acoustical and environmental robustness in automatic speech recognition
Formant extraction from group delay function

Speech Communication
Fundamentals of speech recognition

Fundamentals of speech recognition
Root cepstral analysis: a unified view: application to speech processing in car noise environments

Speech Communication - Special issue on speech processing in adverse conditions
Recognizing Reverberant Speech with RASTA - PLP

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Time and Frequency Pruning for Speaker Identification

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
An investigation of PLP and IMELDA acoustic representations and of their potential for combination

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Dynamic classifier combination in hybrid speech recognition systems using utterance-level confidence values

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Significance of the Modified Group Delay Feature in Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Robustness of group delay representations for noisy speech signals

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the significance of combining cepstral features derived from the modified group delay function and from the short-time spectral magnitude like the MFCC. The conventional group delay function fails to capture the resonant structure and the dynamic range of the speech spectrum primarily due to pitch periodicity effects. The group delay function is modified to suppress these spikes and to restore the dynamic range of the speech spectrum. Cepstral features are derived from the modified group delay function, which are called the modified group delay feature (MODGDF). The complementarity and robustness of the MODGDF when compared to the MFCC are also analyzed using spectral reconstruction techniques. Combination of several spectral magnitude-based features and the MODGDF using feature fusion and likelihood combination is described. These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition. Results indicate that combining MODGDF with MFCC at the feature level gives significant improvements for speech recognition tasks in noise. Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11% at best, while combining any two features derived from the spectral magnitude does not give any significant improvement.