Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic

  • Authors:
  • Martha Yifiru Tachbelie;Solomon Teferra Abate;Laurent Besacier

  • Affiliations:
  • School of Information Sciences, Addis Ababa University, Addis Ababa, Ethiopia;School of Information Sciences, Addis Ababa University, Addis Ababa, Ethiopia;Laboratoire d'informatique de Grenoble (LIG), Université Joseph Fourier, Grenoble 1, France

  • Venue:
  • Speech Communication
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

State-of-the-art large vocabulary continuous speech recognition systems use mostly phone based acoustic models (AMs) and word based lexical and language models. However, phone based AMs are not efficient in modeling long-term temporal dependencies and the use of words in lexical and language models leads to out-of-vocabulary (OOV) problem, which is a serious issue for morphologically rich languages. This paper presents the results of our contributions on the use of different units for acoustic, lexical and language modeling for an under-resourced language (Amharic spoken in Ethiopia). Triphone, Syllable and hybrid (syllable-phone) units have been investigated for acoustic modeling. Word and morphemes have been investigated for lexical and language modeling. We have also investigated the use of longer (syllable) acoustic units and shorter (morpheme) lexical as well as language modeling units in a speech recognition system. Although hybrid AMs did not bring much improvement over context dependent syllable based recognizers in speech recognition performance with word based lexical and language model (i.e. word based speech recognition), we observed a significant word error rate (WER) reduction compared to triphone-based systems in morpheme-based speech recognition. Syllable AMs also led to a WER reduction over the triphone-based systems both in word based and morpheme based speech recognition. It was possible to obtain a 3% absolute WER reduction as a result of using syllable acoustic units in morpheme-based speech recognition. Overall, our result shows that syllable and hybrid AMs are best fitted in morpheme-based speech recognition.