Blind source extraction for robust speech recognition in multisource noisy environments

Authors:
Francesco Nesta;Marco Matassoni
Affiliations:
Fondazione Bruno Kessler CIT-irst via Sommarive 18, 38123 Trento, Italy;Fondazione Bruno Kessler CIT-irst via Sommarive 18, 38123 Trento, Italy
Venue:
Computer Speech and Language
Year:
2013

Citing 11
Cited 1

Acoustic Echo and Noise Control: A Practical Approach

Acoustic Echo and Noise Control: A Practical Approach
A noise-robust stochastic gradient algorithm with an adaptive step-size suitable for mobile hands-free telephones

ICASSP '95 Proceedings of the Acoustics, Speech, and Signal Processing, 1995. on International Conference - Volume 02
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Monaural speech separation and recognition challenge

Computer Speech and Language
Beamforming With a Maximum Negentropy Criterion

IEEE Transactions on Audio, Speech, and Language Processing
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment

IEEE Transactions on Audio, Speech, and Language Processing
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Generalized State Coherence Transform for Multidimensional TDOA Estimation of Multiple Sources

IEEE Transactions on Audio, Speech, and Language Processing
Combining Speech Fragment Decoding and Adaptive Noise Floor Modeling

IEEE Transactions on Audio, Speech, and Language Processing
The PASCAL CHiME speech separation and recognition challenge

Computer Speech and Language

Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes and describes a complete system for Blind Source Extraction (BSE). The goal is to extract a target signal source in order to recognize spoken commands uttered in reverberant and noisy environments, and acquired by a microphone array. The architecture of the BSE system is based on multiple stages: (a) TDOA estimation, (b) mixing system identification for the target source, (c) on-line semi-blind source separation and (d) source extraction. All the stages are effectively combined, allowing the estimation of the target signal with limited distortion. While a generalization of the BSE framework is described, here the proposed system is evaluated on the data provided for the CHiME Pascal 2011 competition, i.e. binaural recordings made in a real-world domestic environment. The CHiME mixtures are processed with the BSE and the recovered target signal is fed to a recognizer, which uses noise robust features based on Gammatone Frequency Cepstral Coefficients. Moreover, acoustic model adaptation is applied to further reduce the mismatch between training and testing data and improve the overall performance. A detailed comparison between different models and algorithmic settings is reported, showing that the approach is promising and the resulting system gives a significant reduction of the error rate.