Multiple resolution analysis for robust automatic speech recognition

Authors:
Roberto Gemello;Franco Mana;Dario Albesano;Renato De Mori
Affiliations:
Loquendo, Via Valdellatorre, 4, 10149 Torino, Italy;Loquendo, Via Valdellatorre, 4, 10149 Torino, Italy;Loquendo, Via Valdellatorre, 4, 10149 Torino, Italy;Lia Ceri-Iup, University of Avignon, BP 1228, 84911 Avignon Cedex 9, France
Venue:
Computer Speech and Language
Year:
2006

Citing 10
Cited 2

Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars

Speech Communication - Eurospeech '91
Orthonormal bases of compactly supported wavelets II: variations on a theme

SIAM Journal on Mathematical Analysis
Wavelets and subband coding

Wavelets and subband coding
Relevancy of time-frequency features for phonetic classification measured by mutual information

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
The Teager energy based feature parameters for robust speech recognition in car noise

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Time-frequency signal decomposition using energy mixture models

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Wavelet packets based features selection for voiceless plosives classification

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Formulas for orthogonal IIR wavelet filters

IEEE Transactions on Signal Processing
AM-FM energy detection and separation in noise using multibandenergy operators

IEEE Transactions on Signal Processing
De-noising by soft-thresholding

IEEE Transactions on Information Theory

Automatic speech recognition and speech variability: A review

Speech Communication
Using knowledge of misunderstandings to increase the robustness of spoken dialogue systems

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the potential of exploiting the redundancy implicit in multiple resolution analysis for automatic speech recognition systems. The analysis is performed by a binary tree of elements, each one of which is made by a half-band filter followed by a down sampler which discards odd samples. Filter design and feature computation from samples are discussed and recognition performance with different choices is presented. A paradigm consisting in redundant feature extraction, followed by feature normalization, followed by dimensionality reduction is proposed. Feature normalization is performed by denoising algorithms. Two of them are considered and evaluated, namely, signal-to-noise ratio-dependent spectral subtraction and soft thresholding. Dimensionality reduction is performed with principal component analysis. Experiments using telephone corpora and the Aurora3 corpus are reported. They indicate that the proposed paradigm leads to a recognition performance with clean speech, measured in word error rate, marginally superior to the one obtained with perceptual linear prediction coefficients. Nevertheless, performance of the proposed analysis paradigm is significantly superior when used with noisy data and the same denoising algorithm is applied to all the analysis methods, which are compared.