Language-dependent state clustering for multilingual acoustic modelling

Authors:
Thomas Niesler
Affiliations:
Department of Electrical and Electronic Engineering, University of Stellenbosch, Stellenbosch, South Africa
Venue:
Speech Communication
Year:
2007

Citing 4
Cited 5

Multilingual phone models for vocabulary-independent speech recognition tasks

Speech Communication
Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication
Multilingual speech recognition in seven languages

Speech Communication
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology

Collecting and evaluating speech recognition corpora for nine Southern Bantu languages

AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
Collecting and evaluating speech recognition corpora for 11 South African languages

Language Resources and Evaluation
Multi-accent acoustic modelling of South African English

Speech Communication
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
Using out-of-language data to improve an under-resourced speech recognizer

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need to compile annotated speech databases remains an impediment to the development of automatic speech recognition (ASR) systems in under-resourced multilingual environments. We investigate whether it is possible to combine speech data from different languages spoken within the same multilingual population to improve the overall performance of a speech recognition system. For our investigation, we use recently collected Afrikaans, South African English, Xhosa and Zulu speech databases. Each consists of between 6 and 7h of speech that has been annotated at the phonetic and the orthographic level using a common IPA-based phone set. We compare the performance of separate language-specific systems with that of multilingual systems based on straightforward pooling of training data as well as on a data-driven alternative. For the latter, we extend the decision-tree clustering process normally used to construct tied-state hidden Markov models to allow the inclusion of language-specific questions, and compare the performance of systems that allow sharing between languages with those that do not. We find that multilingual acoustic models obtained in this way show a small but consistent improvement over separate-language systems as well as systems based on IPA-based data pooling.