Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic

Authors:
Martha Yifiru Tachbelie;Solomon Teferra Abate;Laurent Besacier
Affiliations:
School of Information Sciences, Addis Ababa University, Addis Ababa, Ethiopia;School of Information Sciences, Addis Ababa University, Addis Ababa, Ethiopia;Laboratoire d'informatique de Grenoble (LIG), Université Joseph Fourier, Grenoble 1, France
Venue:
Speech Communication
Year:
2014

Citing 9
Cited 1

A Rational Design for a Weighted Finite-State Transducer Library

WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
Modelling out-of-vocabulary words for robust speech recognition

Modelling out-of-vocabulary words for robust speech recognition
Syllable-based automatic arabic speech recognition in noisy-telephone channel

WSEAS Transactions on Signal Processing
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic word decompounding for ASR in a morphologically rich language: application to Amharic

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing morphologically rich languages
Syllable-based speech recognition for Amharic

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Cheap, fast and good enough: automatic speech recognition with non-expert transcription

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Using the Amazon Mechanical Turk to transcribe and annotate meeting speech for extractive summarization

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Morpheme-based and factored language modeling for amharic speech recognition

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics

Automatic speech recognition for under-resourced languages: A survey

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-the-art large vocabulary continuous speech recognition systems use mostly phone based acoustic models (AMs) and word based lexical and language models. However, phone based AMs are not efficient in modeling long-term temporal dependencies and the use of words in lexical and language models leads to out-of-vocabulary (OOV) problem, which is a serious issue for morphologically rich languages. This paper presents the results of our contributions on the use of different units for acoustic, lexical and language modeling for an under-resourced language (Amharic spoken in Ethiopia). Triphone, Syllable and hybrid (syllable-phone) units have been investigated for acoustic modeling. Word and morphemes have been investigated for lexical and language modeling. We have also investigated the use of longer (syllable) acoustic units and shorter (morpheme) lexical as well as language modeling units in a speech recognition system. Although hybrid AMs did not bring much improvement over context dependent syllable based recognizers in speech recognition performance with word based lexical and language model (i.e. word based speech recognition), we observed a significant word error rate (WER) reduction compared to triphone-based systems in morpheme-based speech recognition. Syllable AMs also led to a WER reduction over the triphone-based systems both in word based and morpheme based speech recognition. It was possible to obtain a 3% absolute WER reduction as a result of using syllable acoustic units in morpheme-based speech recognition. Overall, our result shows that syllable and hybrid AMs are best fitted in morpheme-based speech recognition.