Collecting and evaluating speech recognition corpora for 11 South African languages

Authors:
Jaco Badenhorst;Charl Heerden;Marelie Davel;Etienne Barnard
Affiliations:
Human Language Technology Competency Area, CSIR Meraka Institute, Pretoria, South Africa;Human Language Technology Competency Area, CSIR Meraka Institute, Pretoria, South Africa;Human Language Technology Competency Area, CSIR Meraka Institute, Pretoria, South Africa;Multilingual Speech Technologies, North-West University, Vanderbijlpark, South Africa 1900
Venue:
Language Resources and Evaluation
Year:
2011

Citing 11
Cited 2

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Characterizing rational versus exponential learning curves

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication
Voice User Interface Design

Voice User Interface Design
Learning pronunciation dictionaries: language complexity and word selection strategies

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Towards language independent acoustic modeling

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Language-dependent state clustering for multilingual acoustic modelling

Speech Communication
Pronunciation prediction with Default&Refine

Computer Speech and Language
Avaaj Otalo: a field study of an interactive voice forum for small farmers in rural India

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
HIV health information access using spoken dialogue systems: touchtone vs. speech

ICTD'09 Proceedings of the 3rd international conference on Information and communication technologies and development
Speech vs. touch-tone: telephony interfaces for information access by low literate users

ICTD'09 Proceedings of the 3rd international conference on Information and communication technologies and development

The South African Human Language Technology Audit

Language Resources and Evaluation
A smartphone-based ASR data collection tool for under-resourced languages

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which contains data from the eleven official languages of South Africa. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of the ASR models derived from the corpus. We also report on phoneme distance measures across languages, and describe initial phone recognisers that were developed using this data. We find that a surprisingly small number of speakers (fewer than 50) and around 10 to 20 h of speech per language are sufficient for the purposes of acceptable phone-based recognition.