Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

  • Authors:
  • Peter Jančovič;Münevver Köküer

  • Affiliations:
  • School of Electronic, Electrical and Computer Engineering, University of Birmingham, Pritchatts Road, B15 2TT Birmingham, UK;School of Electronic, Electrical and Computer Engineering, University of Birmingham, Pritchatts Road, B15 2TT Birmingham, UK

  • Venue:
  • Speech Communication
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a model for the incorporation of voicing information into a speech recognition system in noisy environments. The employed voicing information is estimated by a novel method that can provide this information for each filter-bank channel and does not require information about the fundamental frequency. The voicing information is modelled by employing the Bernoulli distribution. The voicing model is obtained for each HMM state and mixture by a Viterbi-style training procedure. The proposed voicing incorporation is evaluated both within a standard model and two other models that had compensated for the noise effect, the missing-feature and the multi-conditional training model. Experiments are first performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved when the voicing information is incorporated within the standard model as well as the noise-compensated models. The employment of voicing information is also demonstrated on a phoneme recognition task on the noise-corrupted TIMIT database and considerable improvements are observed.