Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Language-independent and language-adaptive acoustic modeling for speech recognition
Speech Communication
The Challenges of Technology Research for Developing Regions
IEEE Pervasive Computing
Multilingual Speech Processing
Multilingual Speech Processing
Pronunciation prediction with Default&Refine
Computer Speech and Language
Frontiers in linguistic annotation for lower-density languages
LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
HIV health information access using spoken dialogue systems: touchtone vs. speech
ICTD'09 Proceedings of the 3rd international conference on Information and communication technologies and development
The human language project: building a Universal Corpus of the world's languages
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Tools for collecting speech corpora via Mechanical-Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
The Lwazi community communication service: design and piloting of a voice-based information service
Proceedings of the 20th international conference companion on World wide web
Collecting and evaluating speech recognition corpora for 11 South African languages
Language Resources and Evaluation
The South African Human Language Technology Audit
Language Resources and Evaluation
Automatic speech recognition for under-resourced languages: A survey
Speech Communication
Hi-index | 0.00 |
Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under-resourced languages, many of which are found in the developing world. We provide a brief overview of related data collection strategies, highlighting some of the salient issues pertaining to collecting ASR data for under-resourced languages. We then describe the development of a smartphone-based data collection tool, Woefzela, which is designed to function in a developing world context. Specifically, this tool is designed to function without any Internet connectivity, while remaining portable and allowing for the collection of multiple sessions in parallel; it also simplifies the data collection process by providing process support to various role players during the data collection process, and performs on-device quality control in order to maximise the use of recording opportunities. The use of the tool is demonstrated as part of a South African data collection project, during which almost 800 hours of ASR data was collected, often in remote, rural areas, and subsequently used to successfully build acoustic models for eleven languages. The on-device quality control mechanism (referred to as QC-on-the-go) is an interesting aspect of the Woefzela tool and we discuss this functionality in more detail. We experiment with different uses of quality control information, and evaluate the impact of these on ASR accuracy. Woefzela was developed for the Android Operating System and is freely available for use on Android smartphones.