Measuring similarity between transliterations against noise data

Authors:
Chung-Chian Hsu;Chien-Hsing Chen;Tien-Teng Shih;Chun-Kai Chen
Affiliations:
National Yunlin University of Science and Technology, Taiwan, R.O. C;National Yunlin University of Science and Technology, Taiwan, R.O. C;National Yunlin University of Science and Technology, Taiwan, R.O. C;National Yunlin University of Science and Technology, Taiwan, R.O. C
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2007

Citing 11
Cited 4

Algorithms for clustering data

Algorithms for clustering data
Fundamentals of speech recognition

Fundamentals of speech recognition
Learning bias and phonological-rule induction

Computational Linguistics
The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information

The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information
Similarity metrics for aligning children's articulation data

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Proper name translation in cross-language information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Translating–transliterating named entities for multilingual information access

Journal of the American Society for Information Science and Technology
Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

ACM Transactions on Asian Language Information Processing (TALIP)
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Synonymous Chinese Transliterations Retrieval from World Wide Web by Using Association Words

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Mining Synonymous Transliterations from the World Wide Web

ACM Transactions on Asian Language Information Processing (TALIP)
Synonyms extraction using web content focused crawling

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Comparison of ensemble classifiers in extracting synonymous Chinese transliteration pairs from web

ICSI'11 Proceedings of the Second international conference on Advances in swarm intelligence - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

When editors of newspapers and magazines translate proper nouns from foreign languages into Chinese, the Chinese translation (termed transliterations) they choose will typically be phonetically similar to the original word. With many different translators working without a common standard, there may be many different Chinese transliterations for the same proper noun, such as using the same sounds but different Chinese characters or even using different sounds and characters. This causes confusion for the reader and, more importantly, leads to incomplete Chinese Web search results. This article investigates the similarity comparison of transliterations as a first step toward solving the incomplete search problem. We devise a method based on comparing digitalized Chinese character (or Hanzi) sounds. Along with four other methods based on comparing grapheme or phoneme similarity, we compare their performance of identifying synonymous transliterations against noise words taken from Web pages. Experimental results indicate that our method surpasses the other methods due to its advantage of containing more discriminative information in sound vectors. The method performing the second best is based on a scheme which assigns similarity between phonemes by carefully considering articulatory features of phonemes, including using multivalued features and placing different weights on the features. Among six pinyin schemes used to romanize Chinese transliterations, the Tongyong scheme outperforms the others.