Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment

Authors:
Sami Keronen;Heikki Kallasjoki;Ulpu Remes;Guy J. Brown;Jort F. Gemmeke;Kalle J. PalomäKi
Affiliations:
Aalto University School of Science, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland;Aalto University School of Science, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland;Aalto University School of Science, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland;University of Sheffield, Department of Computer Science, Regent Court, 211 Portobello St., Sheffield S1 4DP, UK;KU Leuven, Department ESAT-PSI, Kasteelpark Arenberg 10, 3001 Heverlee, Belgium;Aalto University School of Science, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland
Venue:
Computer Speech and Language
Year:
2013

Citing 6
Cited 0

Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
On the relation between statistical properties of spectrographic masks and recognition accuracy

SPPRA '08 Proceedings of the Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications
Sparse imputation for large vocabulary noise robust ASR

Computer Speech and Language
Mask estimation for missing data speech recognition based on statistics of binaural interaction

IEEE Transactions on Audio, Speech, and Language Processing
The PASCAL CHiME speech separation and recognition challenge

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an automatic speech recognition system that uses a missing data approach to compensate for challenging environmental noise containing both additive and convolutive components. The unreliable and noise-corrupted (''missing'') components are identified using a Gaussian mixture model (GMM) classifier based on a diverse range of acoustic features. To perform speech recognition using the partially observed data, the missing components are substituted with clean speech estimates computed using both sparse imputation and cluster-based GMM imputation. Compared to two reference mask estimation techniques based on interaural level and time difference-pairs, the proposed missing data approach significantly improved the keyword accuracy rates in all signal-to-noise ratio conditions when evaluated on the CHiME reverberant multisource environment corpus. Of the imputation methods, cluster-based imputation was found to outperform sparse imputation. The highest keyword accuracy was achieved when the system was trained on imputed data, which made it more robust to possible imputation errors.