Multilingual phone models for vocabulary-independent speech recognition tasks
Speech Communication
Language-independent and language-adaptive acoustic modeling for speech recognition
Speech Communication
Multilingual speech recognition in seven languages
Speech Communication
Tree-based state tying for high accuracy acoustic modelling
HLT '94 Proceedings of the workshop on Human Language Technology
Collecting and evaluating speech recognition corpora for nine Southern Bantu languages
AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
Collecting and evaluating speech recognition corpora for 11 South African languages
Language Resources and Evaluation
Multi-accent acoustic modelling of South African English
Speech Communication
Spoken Content Retrieval: A Survey of Techniques and Technologies
Foundations and Trends in Information Retrieval
Using out-of-language data to improve an under-resourced speech recognizer
Speech Communication
Hi-index | 0.00 |
The need to compile annotated speech databases remains an impediment to the development of automatic speech recognition (ASR) systems in under-resourced multilingual environments. We investigate whether it is possible to combine speech data from different languages spoken within the same multilingual population to improve the overall performance of a speech recognition system. For our investigation, we use recently collected Afrikaans, South African English, Xhosa and Zulu speech databases. Each consists of between 6 and 7h of speech that has been annotated at the phonetic and the orthographic level using a common IPA-based phone set. We compare the performance of separate language-specific systems with that of multilingual systems based on straightforward pooling of training data as well as on a data-driven alternative. For the latter, we extend the decision-tree clustering process normally used to construct tied-state hidden Markov models to allow the inclusion of language-specific questions, and compare the performance of systems that allow sharing between languages with those that do not. We find that multilingual acoustic models obtained in this way show a small but consistent improvement over separate-language systems as well as systems based on IPA-based data pooling.