Sparse imputation for large vocabulary noise robust ASR

Authors:
Jort Florent Gemmeke;Bert Cranen;Ulpu Remes
Affiliations:
Centre for Language and Speech Technology, Radboud University Nijmegen, P.O. Box 9103, NL-6500 HD Nijmegen, The Netherlands;Centre for Language and Speech Technology, Radboud University Nijmegen, P.O. Box 9103, NL-6500 HD Nijmegen, The Netherlands;Adaptive Informatics Research Centre, Aalto University School of Science and Technology, P.O. Box 15400, FI-00076 Aalto, Finland
Venue:
Computer Speech and Language
Year:
2011

Citing 12
Cited 2

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Robustness in Automatic Speech Recognition: Fundamentals and Applications

Robustness in Automatic Speech Recognition: Fundamentals and Applications
Unsupervised discovery of morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
On noise masking for automatic missing data speech recognition: A survey and discussion

Computer Speech and Language
Exploiting correlogram structure for robust speech recognition with multiple speech sources

Speech Communication
Issues with uncertainty decoding for noise robust automatic speech recognition

Speech Communication
Robust Face Recognition via Sparse Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse imputation for noise robust speech recognition using soft masks

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Mask estimation for missing data speech recognition based on statistics of binaural interaction

IEEE Transactions on Audio, Speech, and Language Processing
Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Compressed sensing

IEEE Transactions on Information Theory

Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment

Computer Speech and Language
Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

An effective way to increase noise robustness in automatic speech recognition is to label the noisy speech features as either reliable or unreliable ('missing'), and replace ('impute') the missing ones by clean speech estimates. Conventional imputation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low SNRs, frame-based imputation techniques fail because many time frames contain few, if any, reliable features. In previous work, we introduced an exemplar-based method, dubbed sparse imputation, which can impute missing features using reliable features from neighbouring frames. We achieved substantial gains in performance at low SNRs for a connected digit recognition task. In this work, we investigate whether the exemplar-based approach can be generalised to a large vocabulary task. Experiments on artificially corrupted speech show that sparse imputation substantially outperforms a conventional imputation technique when the ideal 'oracle' reliability of features is used. With error-prone estimates of feature reliability, sparse imputation performance is comparable to our baseline imputation technique in the cleanest conditions, and substantially better at lower SNRs. With noisy speech recorded in realistic noise conditions, sparse imputation performs slightly worse than our baseline imputation technique in the cleanest conditions, but substantially better in the noisier conditions.