Robust Romanian language automatic speech recognizer based on multistyle training

Authors:
Doru-Petru Munteanu;Constantin-Iulian Vizitiu
Affiliations:
Communications and Electronic Systems Department, Military Technical Academy, Bucharest, Romania;Communications and Electronic Systems Department, Military Technical Academy, Bucharest, Romania
Venue:
WSEAS Transactions on Computer Research
Year:
2008

Citing 6
Cited 0

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Developments in continuous speech dictation using the 1995 ARPA NAB news task

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A narrative approach for speech signal based MMSE estimation using quantum parameters

WSEAS Transactions on Signal Processing
Robust Romanian language automatic speech recognizer

CIMMACS'07 Proceedings of the 6th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents solutions for increasing environmental robustness of a Romanian language continuous speech recognizer, previously developed. All state-of-the-art automatic speech recognizers (ASR) are data-driven and rely heavily on huge speech data for estimating the model parameters. Most of the available speech corpora used for this training phase contain clean speech recorded in low noise and reverberation free environments with high quality audio equipment. However, in real-world ASR are facing various acoustic conditions, speech signal being degraded by noise, reverberations, convolution distortions, etc. The acoustic mismatches between the training conditions and testing conditions are the main cause of ASR performance degradation. For instance, the word error rate may be an order of magnitude higher in an office environment than in a clean laboratory environment. There are a lot of methods and techniques aiming to keep the ASR performances at an acceptable in various acoustic conditions. In this paper we are presenting a special strategy called multistyle training for building a robust Romanian language ASR system. The method is based on training the recognizer with degraded speech signal obtained by adding to clean speech various levels artificial noise. Experimental results presented, prove that this scheme strongly increase the system robustness to additive noise. The system architecture based on context-dependent HMM phonemes is also described in detail.