Comparative analysis of transliteration techniques based on statistical machine translation and joint-sequence model

  • Authors:
  • Nam X. Cao;Nhut M. Pham;Quan H. Vu

  • Affiliations:
  • University of Science, Vietnam;University of Science, Vietnam;University of Science, Vietnam

  • Venue:
  • Proceedings of the 2010 Symposium on Information and Communication Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The inability to deal with words in foreign languages imposes difficulties to both Vietnamese speech recognition and text-to-speech systems. A common solution is to look up a dictionary, but the number of available entries is finite and therefore not flexible because speech recognition and text-to-speech systems are expected to handle arbitrary words. Alternatively, data-driven approaches can be employed to transliterate a foreign word into its Vietnamese pronunciation by learning samples and predicting unseen words. This paper presents a comparative analysis between two data-driven approaches based on statistical machine translation and joint-sequence model. Two systems based on these approaches are developed and tested using the same experimental protocol and a dataset consisting of 8050 English words. Results show that joint-sequence model outperforms statistical machine translation in English-to-Vietnamese transliteration.