The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Computational Linguistics
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Machine transliteration of names in Arabic text
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Learning transliteration lexicons from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A phonetic similarity model for automatic extraction of transliteration pairs
ACM Transactions on Asian Language Information Processing (TALIP)
Active learning for constructing transliteration lexicons from the Web
Journal of the American Society for Information Science and Technology
Machine transliteration survey
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
This paper proposes a novel approach to automating the construction of transliterated-term lexicons. A simple syllable alignment algorithm is used to construct confusion matrices for cross-language syllable-phoneme conversion. Each row in the confusion matrix consists of a set of syllables in the source language that are (correctly or erroneously) matched phonetically and statistically to a syllable in the target language. Two conversions using phoneme-to-phoneme and text-to-phoneme syllabification algorithms are automatically deduced from a training corpus of paired terms and are used to calculate the degree of similarity between phonemes for transliterated-term extraction. In a large-scale experiment using this automated learning process for conversions, more than 200,000 transliterated-term pairs were successfully extracted by analyzing query results from Internet search engines. Experimental results indicate the proposed approach shows promise in transliterated-term extraction.