Contextual invariant-integration features for improved speaker-independent speech recognition

Authors:
Florian Müller;Alfred Mertins
Affiliations:
Institute for Signal Processing, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany;Institute for Signal Processing, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
Venue:
Speech Communication
Year:
2011

Citing 8
Cited 0

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Segregating information about the size and shape of the vocal tract using a time-domain auditory model: the stabilised wavelet-Mellin transform

Speech Communication
Invariant Features for Gray Scale Images

Mustererkennung 1995, 17. DAGM-Symposium
Anwendung von Invarianzprinzipien zur Merkmalgewinnung in der Mustererkennung

Anwendung von Invarianzprinzipien zur Merkmalgewinnung in der Mustererkennung
Heterogeneous acoustic measurements and multiple classifiers for speech recognition

Heterogeneous acoustic measurements and multiple classifiers for speech recognition
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Automatic speech recognition and speech variability: A review

Speech Communication
The scale representation

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents a feature-extraction method that is based on the theory of invariant integration. The invariant-integration features are derived from an extended time period, and their computation has a very low complexity. Recognition experiments show a superior performance of the presented feature type compared to cepstral coefficients using a mel filterbank (MFCCs) or a gammatone filterbank (GTCCs) in matching as well as in mismatching training-testing conditions. Even without any speaker adaptation, the presented features yield accuracies that are larger than for MFCCs combined with vocal tract length normalization (VTLN) in matching training-test conditions. Also, it is shown that the invariant-integration features (IIFs) can be successfully combined with additional speaker-adaptation methods to further increase the accuracy. In addition to standard MFCCs also contextual MFCCs are introduced. Their performance lies between the one of MFCCs and IIFs.