Disambiguation based on wordnet for transliteration of arabic numerals for korean TTS

Authors:
Youngim Jung;Aesun Yoon;Hyuk-Chul Kwon
Affiliations:
Department of Computer Science and Engineering, Pusan National University, Busan, S. Korea;Department of French, Pusan National University, Busan, S. Korea;Department of Computer Science and Engineering, Pusan National University, Busan, S. Korea
Venue:
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2006

Citing 5
Cited 3

C4.5: programs for machine learning

C4.5: programs for machine learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Disambiguating the senses of non-text symbols for Mandarin TTS systems with a three-layer classifier

Speech Communication
Word sense disambiguation using Conceptual Density

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Data Mining

Data Mining

Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Hybrid word sense disambiguation using language resources for transliteration of Arabic numerals in Korean

Proceedings of the 2009 International Conference on Hybrid Information Technology
Building korean classifier ontology based on korean wordnet

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transliteration of Arabic numerals is not easily resolved. Arabic numerals occur frequently in scientific and informative texts and deliver significant meanings. Since readings of Arabic numerals depend largely on their context, generating accurate pronunciation of Arabic numerals is one of the critical criteria in evaluating TTS systems. In this paper, (1) contextual, pattern, and arithmetic features are extracted from a transliterated corpus; (2) ambiguities of homographic classifiers are resolved based on the semantic relations in KorLex1.0 (Korean Lexico-Semantic Network); (3) a classification model for accurate and efficient transliteration of Arabic numerals is proposed in order to improve Korean TTS systems. The proposed model yields 97.3% accuracy, which is 9.5% higher than that of a customized Korean TTS system.