Disambiguating the senses of non-text symbols for Mandarin TTS systems with a three-layer classifier

Authors:
Ming-Shing Yu;Feng-Long Huang
Affiliations:
Text-To-Speech System Laboratory, Department of Applied Mathematics, National Chung-Hsing University, Taichung 40227, Taiwan, ROC;Text-To-Speech System Laboratory, Department of Applied Mathematics, National Chung-Hsing University, Taichung 40227, Taiwan, ROC
Venue:
Speech Communication
Year:
2003

Citing 12
Cited 4

Class-based n-gram models of natural language

Computational Linguistics
Cognition, computation, and formal systems: some of Toma´s Havra´nek's interests and disambiguating word senses

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Ambiguity in language learning: computational and cognitive models

Ambiguity in language learning: computational and cognitive models
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Selective sampling for example-based word sense disambiguation

Computational Linguistics
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Word sense disambiguation with very large neural networks extracted from machine readable dictionaries

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Corpus-based statistical sense resolution

HLT '93 Proceedings of the workshop on Human Language Technology

Hybrid word sense disambiguation using language resources for transliteration of Arabic numerals in Korean

Proceedings of the 2009 International Conference on Hybrid Information Technology
Semantic categorization of contextual features based on wordnet for g-to-p conversion of arabic numerals combined with homographic classifiers

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Disambiguation based on wordnet for transliteration of arabic numerals for korean TTS

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
An improved TTS model and algorithm for web voice browser

PRIMA'06 Proceedings of the 9th Pacific Rim international conference on Agent Computing and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Various kinds of non-text symbols appear in texts. The oral expressions of these symbols may vary with their senses. This paper proposes a three-layer classifier (TLC) which can disambiguate the senses of these symbols effectively. The layers within TLC are employed in sequence. The 1st layer is composed of two components: pattern table and decision tree. If this layer can disambiguate the sense of the target symbol, the disambiguation task stops. Otherwise the next two layers will be triggered. In such a situation, the procedure will go through the TLC. Based on the Bayesian theory, the 2nd layer adopts the voting scheme to compute the disambiguation score. Several features of token, which may affect the effectiveness of our voting scheme, are analyzed and compared with each other to achieve better accuracy. According to the algorithm confidence of sense disambiguation, the 3rd layer may exploit an alternative model to enhance the performance. Experiments show that our approaches can learn well even with only a small amount of data. The overall accuracies of training and testing sets are 99.8% and 97.5%, respectively.