Temporal modulation normalization for robust speech feature extraction and recognition

  • Authors:
  • Xugang Lu;Shigeki Matsuda;Masashi Unoki;Satoshi Nakamura

  • Affiliations:
  • National Institute of Information and Communications Technology, Tokyo, Japan 184-8795;National Institute of Information and Communications Technology, Tokyo, Japan 184-8795;Japan Advanced Institute of Science and Technology, Ishikawa, Japan 923-1292;National Institute of Information and Communications Technology, Tokyo, Japan 184-8795

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speech signals are produced by the articulatory movements with a certain modulation structure constrained by the regular phonetic sequences. This modulation structure encodes most of the speech intelligibility information that can be used to discriminate the speech from noise. In this study, we proposed a noise reduction algorithm based on this speech modulation property. Two steps are involved in the proposed algorithm: one is the temporal modulation contrast normalization, another is the modulation events preserved smoothing. The purpose for these processing is to normalize the modulation contrast of the clean and noisy speech to be in the same level, and to smooth out the modulation artifacts caused by noise interferences. Since our proposed method can be used independently for noise reduction, it can be combined with the traditional noise reduction methods to further reduce the noise effect. We tested our proposed method as a front-end for robust speech recognition on the AURORA-2J data corpus. Two advanced noise reduction methods, ETSI advanced front-end (AFE) method, and particle filtering (PF) with minimum mean square error (MMSE) estimation method, are used for comparison and combinations. Experimental results showed that, as an independent front-end processor, our proposed method outperforms the advanced methods, and as combined front-ends, further improved the performance consistently than using each method independently.