Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions

  • Authors:
  • Wooil Kim;John H. L. Hansen

  • Affiliations:
  • Center for Robust Speech Systems, Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas Richardson, TX;Center for Robust Speech Systems, Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas Richardson, TX

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel missing-feature reconstruction method to improve speech recognition in background noise environments. The existing missing-feature reconstruction method utilizes log-spectral correlation across frequency bands. In this paper, we propose to employ a temporal spectral feature analysis to improve the missing-feature reconstruction performance by leveraging temporal correlation across neighboring frames. In a similar manner with the conventional method, a Gaussian mixture model is obtained by training over the obtained temporal spectral feature set. The final estimates for missing-feature reconstruction are obtained by a selective combination of the original frequency correlation based method and the proposed temporal correlation-based method. Performance of the proposed method is evaluated on the TIMIT speech corpus using various types of background noise conditions and the CU-Move in-vehicle speech corpus. Experimental results demonstrate that the proposed method is more effective at increasing speech recognition performance in adverse conditions. By employing the proposed temporal-frequency based reconstruction method, a + 17.71% average relative improvement in word error rate (WER) is obtained for white, car, speech babble, and background music conditions over 5-, 10-, and 15-dB SNR, compared to the original frequency correlation-based method. We also obtain a + 16.72% relative improvement in real-life in-vehicle conditions using data from the CU-Move corpus.