An evaluation study on speech feature densities for Bayesian estimation in robust ASR

Authors:
Simone Cifani;Emanuele Principi;Rudy Rotili;Stefano Squartini;Francesco Piazza
Affiliations:
3MediaLabs, DIBET, Università Politecnica delle Marche, Ancona, Italy;3MediaLabs, DIBET, Università Politecnica delle Marche, Ancona, Italy;3MediaLabs, DIBET, Università Politecnica delle Marche, Ancona, Italy;3MediaLabs, DIBET, Università Politecnica delle Marche, Ancona, Italy;3MediaLabs, DIBET, Università Politecnica delle Marche, Ancona, Italy
Venue:
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Year:
2010

Citing 12
Cited 0

Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Laplacian-based MMSE estimator for speech enhancement

Speech Communication
Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement

EURASIP Journal on Applied Signal Processing
Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model

EURASIP Journal on Applied Signal Processing
Speech spectral amplitude estimators using optimally shaped Gamma and Chi priors

Speech Communication
A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

Computer Speech and Language
Comparative evaluation of single-channel MMSE-Based noise reduction schemes for speech recognition

Journal of Electrical and Computer Engineering
MAP Estimators for Speech Enhancement Under Normal and Rayleigh Inverse Gaussian Distributions

IEEE Transactions on Audio, Speech, and Language Processing
Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

IEEE Transactions on Audio, Speech, and Language Processing
Minimum Mean-Squared Error Estimation of Mel-Frequency Cepstral Coefficients Using a Novel Distortion Model

IEEE Transactions on Audio, Speech, and Language Processing
Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors

IEEE Transactions on Audio, Speech, and Language Processing
Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bayesian estimators, especially the Minimum Mean Square Error (MMSE) and the Maximum A Posteriori (MAP), are very popular in estimating the clean speech STFT coefficients. Recently, a similar trend has been successfully applied to speech feature enhancement for robust Automatic Speech/Speaker Recognition (ASR) applications either in the Mel, log-Mel or in the cepstral domain. It is a matter of fact that the goodness of the estimate directly depends on the assumptions made about the noise and speech coefficients densities. Nevertheless, while this latter has been exhaustively studied in the case of STFT coefficients, not equivalent attention has been paid to the case of speech features. In this paper, we study the distribution of Mel, log-Mel as well as MFCC coefficients obtained from speech segments. The histograms of the speech features are first fitted into several pdf models by means of the Chi-Square Goodness-of-Fit test, then they are modeled using a Gaussian Mixture Model (GMM). Performed computer simulations show that the choice of log-Mel and MFCC coefficients is more convenient w.r.t. the Mel one from this perspective.