Large vocabulary Russian speech recognition using syntactico-statistical language modeling

Authors:
Alexey Karpov;Konstantin Markov;Irina Kipyatkova;Daria Vazhenina;Andrey Ronzhin
Affiliations:
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), St. Petersburg, Russia;Human Interface Laboratory, The University of Aizu, Fukushima, Japan;St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), St. Petersburg, Russia;Human Interface Laboratory, The University of Aizu, Fukushima, Japan;St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), St. Petersburg, Russia
Venue:
Speech Communication
Year:
2014

Citing 10
Cited 1

Markov parsing: lattice rescoring with a statistical parser

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
Unlimited vocabulary speech recognition for agglutinative languages

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Module of Morphological and Syntactic Analysis SMART

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Using Mutual Information Criterion to Design an Efficient Phoneme Set for Chinese Speech Recognition

IEICE - Transactions on Information and Systems
Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition

Computer Speech and Language
CORPRES: corpus of Russian professionally read speech

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
State-of-the-art speech recognition technologies for Russian language

Proceedings of the 2012 Joint International Conference on Human-Centered Computer Environments
Fast syntactic analysis for statistical language modeling via substructure sharing and uptraining

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Syntactic dependency-based n-grams as classification features

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II

Automatic speech recognition for under-resourced languages: A survey

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech is the most natural way of human communication and in order to achieve convenient and efficient human-computer interaction implementation of state-of-the-art spoken language technology is necessary. Research in this area has been traditionally focused on several main languages, such as English, French, Spanish, Chinese or Japanese, but some other languages, particularly Eastern European languages, have received much less attention. However, recently, research activities on speech technologies for Czech, Polish, Serbo-Croatian, Russian languages have been steadily increasing. In this paper, we describe our efforts to build an automatic speech recognition (ASR) system for the Russian language with a large vocabulary. Russian is a synthetic and highly inflected language with lots of roots and affixes. This greatly reduces the performance of the ASR systems designed using traditional approaches. In our work, we have taken special attention to the specifics of the Russian language when developing the acoustic, lexical and language models. A special software tool for pronunciation lexicon creation was developed. For the acoustic model, we investigated a combination of knowledge-based and statistical approaches to create several different phoneme sets, the best of which was determined experimentally. For the language model (LM), we introduced a new method that combines syntactical and statistical analysis of the training text data in order to build better n-gram models. Evaluation experiments were performed using two different Russian speech databases and an internally collected text corpus. Among the several phoneme sets we created, the one which achieved the fewest word level recognition errors was the set with 47 phonemes and thus we used it in the following language modeling evaluations. Experiments with 204 thousand words vocabulary ASR were performed to compare the standard statistical n-gram LMs and the language models created using our syntactico-statistical method. The results demonstrated that the proposed language modeling approach is capable of reducing the word recognition errors.