On noise masking for automatic missing data speech recognition: A survey and discussion

Authors:
Christophe Cerisara;Sébastien Demange;Jean-Paul Haton
Affiliations:
LORIA, UMR 7503, Nancy, France;LORIA, UMR 7503, Nancy, France;LORIA, UMR 7503, Nancy, France
Venue:
Computer Speech and Language
Year:
2007

Citing 14
Cited 8

Modelling auditory processing and organisation

Modelling auditory processing and organisation
Is the sine-wave speech cocktail party worth attending?

Speech Communication
Dynamic sound stream formation based on continuity of spectral change

Speech Communication
A blackboard architecture for computational auditory scene analysis

Speech Communication
Assessing local noise level estimation methods: application to noise robust ASR

Speech Communication - Special issue on noise robust ASR
Mathematical Techniques in Multisensor Data Fusion

Mathematical Techniques in Multisensor Data Fusion
Estimation of the signal-to-noise ratio with amplitude modulation spectrograms

Speech Communication
Missing Data Techniques for Robust Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)

A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
Prediction-driven computational auditory scene analysis

Prediction-driven computational auditory scene analysis
The Cocktail Party Problem

Neural Computation
A Biologically Motivated Solution to the Cocktail Party Problem

Neural Computation
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Missing data mask estimation with frequency and temporal dependencies

Computer Speech and Language
Missing data imputation using compressive sensing techniques for connected digit recognition

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
On the relation between statistical properties of spectrographic masks and recognition accuracy

SPPRA '08 Proceedings of the Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications
Feature Fusion Applied to Missing Data ASR with the Combination of Recognizers

Journal of Signal Processing Systems
Sparse imputation for large vocabulary noise robust ASR

Computer Speech and Language
Speaker verification in noisy environment using missing feature approach

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources

Computer Speech and Language
A data mining based approach for travel time prediction in freeway with non-recurrent congestion

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic speech recognition (ASR) has reached very high levels of performance in controlled situations. However, the performance degrades significantly when environmental noise occurs during the recognition process. Nowadays, the major challenge is to reach a good robustness to adverse conditions, so that automatic speech recognizers can be used in real situations. Missing data theory is a very attractive and promising approach. Unlike other denoising methods, missing data recognition does not match the whole data with the acoustic models, but instead considers part of the signal as missing, i.e. corrupted by noise. While speech recognition with missing data can be handled efficiently by methods such as data imputation or marginalization, accurately identifying missing parts (also called masks) remains a very challenging task. This paper reviews the main approaches that have been proposed to address this problem. The objective of this study is to identify the mask estimation methods that have been proposed so far, and to open this domain up to other related research, which could be adapted to overcome this difficult challenge. In order to restrict the range of methods, only the techniques using a single microphone are considered.