Sparse imputation for large vocabulary noise robust ASR

  • Authors:
  • Jort Florent Gemmeke;Bert Cranen;Ulpu Remes

  • Affiliations:
  • Centre for Language and Speech Technology, Radboud University Nijmegen, P.O. Box 9103, NL-6500 HD Nijmegen, The Netherlands;Centre for Language and Speech Technology, Radboud University Nijmegen, P.O. Box 9103, NL-6500 HD Nijmegen, The Netherlands;Adaptive Informatics Research Centre, Aalto University School of Science and Technology, P.O. Box 15400, FI-00076 Aalto, Finland

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

An effective way to increase noise robustness in automatic speech recognition is to label the noisy speech features as either reliable or unreliable ('missing'), and replace ('impute') the missing ones by clean speech estimates. Conventional imputation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low SNRs, frame-based imputation techniques fail because many time frames contain few, if any, reliable features. In previous work, we introduced an exemplar-based method, dubbed sparse imputation, which can impute missing features using reliable features from neighbouring frames. We achieved substantial gains in performance at low SNRs for a connected digit recognition task. In this work, we investigate whether the exemplar-based approach can be generalised to a large vocabulary task. Experiments on artificially corrupted speech show that sparse imputation substantially outperforms a conventional imputation technique when the ideal 'oracle' reliability of features is used. With error-prone estimates of feature reliability, sparse imputation performance is comparable to our baseline imputation technique in the cleanest conditions, and substantially better at lower SNRs. With noisy speech recorded in realistic noise conditions, sparse imputation performs slightly worse than our baseline imputation technique in the cleanest conditions, but substantially better in the noisier conditions.