Feature normalization based on non-extensive statistics for speech recognition

Authors:
Hilman F. Pardede;Koji Iwano;Koichi Shinoda
Affiliations:
Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Ookayama 2-12-1, Meguro-ku, Tokyo 152-8552, Japan;Faculty of Environmental and Information Studies, Tokyo City University, Ushikubo-nishi, 3-3-1, Tsuzuki-ku, Yokohama 224-8551, Japan;Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Ookayama 2-12-1, Meguro-ku, Tokyo 152-8552, Japan
Venue:
Speech Communication
Year:
2013

Citing 7
Cited 0

Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars

Speech Communication - Eurospeech '91
Speech recognition in noisy environments using first-order vector Taylor series

Speech Communication
Cepstral domain segmental feature vector normalization for noise robust speech recognition

Speech Communication - Special issue on robust speech recognition
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

Computer Speech and Language
A comparison of the existence of `cross terms' in the Wignerdistribution and the squared magnitude of the wavelet transform and theshort-time Fourier transform

IEEE Transactions on Signal Processing
Mechanism of the cross-terms in spectrograms

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most compensation methods to improve the robustness of speech recognition systems in noisy environments such as spectral subtraction, CMN, and MVN, rely on the fact that noise and speech spectra are independent. However, the use of limited window in signal processing may introduce a cross-term between them, which deteriorates the speech recognition accuracy. To tackle this problem, we introduce the q-logarithmic (q-log) spectral domain of non-extensive statistics and propose q-log spectral mean normalization (q-LSMN) which is an extension of log spectral mean normalization (LSMN) to this domain. The recognition experiments on a synthesized noisy speech database, the Aurora-2 database, showed that q-LSMN was consistently better than the conventional normalization methods, CMN, LSMN, and MVN. Furthermore, q-LSMN was even more effective when applied to a real noisy environment in the CENSREC-2 database. It significantly outperformed ETSI AFE front-end.