Robust features for speaker-independent speech recognition based on a certain class of translation-invariant transformations

Authors:
Florian Müller;Alfred Mertins
Affiliations:
Institute for Signal Processing, University of Lübeck, Lübeck, Germany;Institute for Signal Processing, University of Lübeck, Lübeck, Germany
Venue:
NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
Year:
2009

Citing 2
Cited 0

Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic speech recognition and speech variability: A review

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The spectral effects of vocal tract length (VTL) differences are one reason for the lower recognition rate of today's speaker-independent automatic speech recognition (ASR) systems compared to speaker-dependent ones. By using certain types of filter banks the VTL-related effects can be described by a translation in subband-index space. In this paper, nonlinear translation-invariant transformations that originally have been proposed in the field of pattern recognition are investigated for their applicability in speaker-independent ASR tasks. It is shown that the combination of different types of such transformations leads to features that are more robust against VTL changes than the standard mel-frequency cepstral coefficients and that they almost yield the performance of vocal tract length normalization without any adaption to individual speakers.