A context-sensitive homograph disambiguation in Thai text-to-speech synthesis

  • Authors:
  • Virongrong Tesprasit;Paisarn Charoenpornsawat;Virach Sornlertlamvanich

  • Affiliations:
  • National Electronics and Computer Technology Center, Klong Luang, Pathumthani, Thailand;National Electronics and Computer Technology Center, Klong Luang, Pathumthani, Thailand;National Electronics and Computer Technology Center, Klong Luang, Pathumthani, Thailand

  • Venue:
  • NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Homograph ambiguity is an original issue in Text-to-Speech (TTS). To disambiguate homograph, several efficient approaches have been proposed such as part-of-speech (POS) n-gram, Bayesian classifier, decision tree, and Bayesian-hybrid approaches. These methods need words or/and POS tags surrounding the question homographs in disambiguation. Some languages such as Thai, Chinese, and Japanese have no word-boundary delimiter. Therefore before solving homograph ambiguity, we need to identify word boundaries. In this paper, we propose a unique framework that solves both word segmentation and homograph ambiguity problems altogether. Our model employs both local and long-distance contexts, which are automatically extracted by a machine learning technique called Winnow.