A Rational Design for a Weighted Finite-State Transducer Library
WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
Modelling out-of-vocabulary words for robust speech recognition
Modelling out-of-vocabulary words for robust speech recognition
Syllable-based automatic arabic speech recognition in noisy-telephone channel
WSEAS Transactions on Signal Processing
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic word decompounding for ASR in a morphologically rich language: application to Amharic
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing morphologically rich languages
Syllable-based speech recognition for Amharic
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Cheap, fast and good enough: automatic speech recognition with non-expert transcription
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Morpheme-based and factored language modeling for amharic speech recognition
LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Automatic speech recognition for under-resourced languages: A survey
Speech Communication
Hi-index | 0.00 |
State-of-the-art large vocabulary continuous speech recognition systems use mostly phone based acoustic models (AMs) and word based lexical and language models. However, phone based AMs are not efficient in modeling long-term temporal dependencies and the use of words in lexical and language models leads to out-of-vocabulary (OOV) problem, which is a serious issue for morphologically rich languages. This paper presents the results of our contributions on the use of different units for acoustic, lexical and language modeling for an under-resourced language (Amharic spoken in Ethiopia). Triphone, Syllable and hybrid (syllable-phone) units have been investigated for acoustic modeling. Word and morphemes have been investigated for lexical and language modeling. We have also investigated the use of longer (syllable) acoustic units and shorter (morpheme) lexical as well as language modeling units in a speech recognition system. Although hybrid AMs did not bring much improvement over context dependent syllable based recognizers in speech recognition performance with word based lexical and language model (i.e. word based speech recognition), we observed a significant word error rate (WER) reduction compared to triphone-based systems in morpheme-based speech recognition. Syllable AMs also led to a WER reduction over the triphone-based systems both in word based and morpheme based speech recognition. It was possible to obtain a 3% absolute WER reduction as a result of using syllable acoustic units in morpheme-based speech recognition. Overall, our result shows that syllable and hybrid AMs are best fitted in morpheme-based speech recognition.