Robust Romanian language automatic speech recognizer based on multistyle training

  • Authors:
  • Doru-Petru Munteanu;Constantin-Iulian Vizitiu

  • Affiliations:
  • Communications and Electronic Systems Department, Military Technical Academy, Bucharest, Romania;Communications and Electronic Systems Department, Military Technical Academy, Bucharest, Romania

  • Venue:
  • WSEAS Transactions on Computer Research
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents solutions for increasing environmental robustness of a Romanian language continuous speech recognizer, previously developed. All state-of-the-art automatic speech recognizers (ASR) are data-driven and rely heavily on huge speech data for estimating the model parameters. Most of the available speech corpora used for this training phase contain clean speech recorded in low noise and reverberation free environments with high quality audio equipment. However, in real-world ASR are facing various acoustic conditions, speech signal being degraded by noise, reverberations, convolution distortions, etc. The acoustic mismatches between the training conditions and testing conditions are the main cause of ASR performance degradation. For instance, the word error rate may be an order of magnitude higher in an office environment than in a clean laboratory environment. There are a lot of methods and techniques aiming to keep the ASR performances at an acceptable in various acoustic conditions. In this paper we are presenting a special strategy called multistyle training for building a robust Romanian language ASR system. The method is based on training the recognizer with degraded speech signal obtained by adding to clean speech various levels artificial noise. Experimental results presented, prove that this scheme strongly increase the system robustness to additive noise. The system architecture based on context-dependent HMM phonemes is also described in detail.