Environmental robust speech and speaker recognition through multi-channel histogram equalization

Authors:
Stefano Squartini;Emanuele Principi;Rudy Rotili;Francesco Piazza
Affiliations:
3MediaLabs, Department of Information Engineering, Universití Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy;3MediaLabs, Department of Information Engineering, Universití Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy;3MediaLabs, Department of Information Engineering, Universití Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy;3MediaLabs, Department of Information Engineering, Universití Politecnica delle Marche, Via Brecce Bianche 1, 60131, Ancona, Italy
Venue:
Neurocomputing
Year:
2012

Citing 22
Cited 0

Microphone Array Based Speech Recognition with Different Talker-Array Positions

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Microphone array processing for robust speech recognition

Microphone array processing for robust speech recognition
Speech Enhancement (Signal Processing and Communications)

Speech Enhancement (Signal Processing and Communications)
Speech Recognition over Digital Channels: Robustness And Standards

Speech Recognition over Digital Channels: Robustness And Standards
Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning

Computer Speech and Language
Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement

EURASIP Journal on Applied Signal Processing
Multichannel direction-independent speech enhancement using spectral amplitude estimation

EURASIP Journal on Applied Signal Processing
A tutorial on text-independent speaker verification

EURASIP Journal on Applied Signal Processing
Springer Handbook of Speech Processing

Springer Handbook of Speech Processing
Text-independent speaker recognition using graph matching

Pattern Recognition Letters
A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

Computer Speech and Language
Histogram Equalization Utilizing Window-Based Smoothed CDF Estimation for Feature Compensation

IEICE - Transactions on Information and Systems
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Higher order cepstral moment normalization for improved robust speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Keyword spotting based system for conversation fostering in tabletop scenarios: preliminary evaluation

HSI'09 Proceedings of the 2nd conference on Human System Interactions
Multichannel Cepstral Domain Feature Warping for Robust Speech Recognition

Proceedings of the 2011 conference on Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
Comparative evaluation of single-channel MMSE-Based noise reduction schemes for speech recognition

Journal of Electrical and Computer Engineering
Robust multi-stream keyword and non-linguistic vocalization detection for computationally intelligent virtual agents

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Far-Field Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

IEEE Transactions on Audio, Speech, and Language Processing
Minimum Mean-Squared Error Estimation of Mel-Frequency Cepstral Coefficients Using a Novel Distortion Model

IEEE Transactions on Audio, Speech, and Language Processing
Robust Speaker Recognition in Noisy Conditions

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automaticspeech and speaker recognition in noisy acoustic scenarios: feature coefficients are normalized by using suitable linear or nonlinear transformations in order to match the noisy speech statistics to the clean speech one. Histogram equalization (HEQ) belongs to such a category of algorithms and has proved to be effective on purpose and therefore taken here as reference. In this paper the presence of multi-channel acoustic channels is used to enhance the statistics modeling capabilities of the HEQ algorithm, by exploiting the availability of multiple noisy speech occurrences, with the aim of maximizing the effectiveness of the cepstra normalization process. Computer simulations based on the Aurora 2 database in speech and speaker recognition scenarios have shown that a significant recognition improvement with respect to the single-channel counterpart and other multi-channel techniques can be achieved confirming the effectiveness of the idea. The proposed algorithmic configuration has also been combined with the kernel estimation technique in order to further improve the speech recognition performances.