Vietnamese automatic speech recognition: the FLaVoR approach

Authors:
Quan Vu;Kris Demuynck;Dirk Van Compernolle
Affiliations:
K.U.Leuven/ESAT/PSI, Leuven, Belgium;K.U.Leuven/ESAT/PSI, Leuven, Belgium;K.U.Leuven/ESAT/PSI, Leuven, Belgium
Venue:
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Year:
2006

Citing 1
Cited 1

FSA: an efficient and flexible C++ toolkit for finite state automata using on-demand computation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Multimodal smart interactive presentation system

HCI'13 Proceedings of the 15th international conference on Human-Computer Interaction: interaction modalities and techniques - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic speech recognition for languages in Southeast Asia, including Chinese, Thai and Vietnamese, typically models both acoustics and languages at the syllable level. This paper presents a new approach for recognizing those languages by exploiting information at the word level. The new approach, adapted from our FLaVoR architecture[1], consists of two layers. In the first layer, a pure acoustic-phonemic search generates a dense phoneme network enriched with meta data. In the second layer, a word decoding is performed in the composition of a series of finite state transducers (FST), combining various knowledge sources across sub-lexical, word lexical and word-based language models. Experimental results on the Vietnamese Broadcast News corpus showed that our approach is both effective and flexible.