Vietnamese automatic speech recognition: the FLaVoR approach

  • Authors:
  • Quan Vu;Kris Demuynck;Dirk Van Compernolle

  • Affiliations:
  • K.U.Leuven/ESAT/PSI, Leuven, Belgium;K.U.Leuven/ESAT/PSI, Leuven, Belgium;K.U.Leuven/ESAT/PSI, Leuven, Belgium

  • Venue:
  • ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic speech recognition for languages in Southeast Asia, including Chinese, Thai and Vietnamese, typically models both acoustics and languages at the syllable level. This paper presents a new approach for recognizing those languages by exploiting information at the word level. The new approach, adapted from our FLaVoR architecture[1], consists of two layers. In the first layer, a pure acoustic-phonemic search generates a dense phoneme network enriched with meta data. In the second layer, a word decoding is performed in the composition of a series of finite state transducers (FST), combining various knowledge sources across sub-lexical, word lexical and word-based language models. Experimental results on the Vietnamese Broadcast News corpus showed that our approach is both effective and flexible.