Algorithms for clustering data
Algorithms for clustering data
Fundamentals of speech recognition
Fundamentals of speech recognition
Learning bias and phonological-rule induction
Computational Linguistics
The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information
The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information
Similarity metrics for aligning children's articulation data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Proper name translation in cross-language information retrieval
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Backward machine transliteration by learning phonetic similarity
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Transliteration of proper names in cross-lingual information retrieval
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Translating–transliterating named entities for multilingual information access
Journal of the American Society for Information Science and Technology
ACM Transactions on Asian Language Information Processing (TALIP)
Translating names and technical terms in Arabic text
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Synonymous Chinese Transliterations Retrieval from World Wide Web by Using Association Words
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Mining Synonymous Transliterations from the World Wide Web
ACM Transactions on Asian Language Information Processing (TALIP)
Synonyms extraction using web content focused crawling
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Comparison of ensemble classifiers in extracting synonymous Chinese transliteration pairs from web
ICSI'11 Proceedings of the Second international conference on Advances in swarm intelligence - Volume Part II
Hi-index | 0.00 |
When editors of newspapers and magazines translate proper nouns from foreign languages into Chinese, the Chinese translation (termed transliterations) they choose will typically be phonetically similar to the original word. With many different translators working without a common standard, there may be many different Chinese transliterations for the same proper noun, such as using the same sounds but different Chinese characters or even using different sounds and characters. This causes confusion for the reader and, more importantly, leads to incomplete Chinese Web search results. This article investigates the similarity comparison of transliterations as a first step toward solving the incomplete search problem. We devise a method based on comparing digitalized Chinese character (or Hanzi) sounds. Along with four other methods based on comparing grapheme or phoneme similarity, we compare their performance of identifying synonymous transliterations against noise words taken from Web pages. Experimental results indicate that our method surpasses the other methods due to its advantage of containing more discriminative information in sound vectors. The method performing the second best is based on a scheme which assigns similarity between phonemes by carefully considering articulatory features of phonemes, including using multivalued features and placing different weights on the features. Among six pinyin schemes used to romanize Chinese transliterations, the Tongyong scheme outperforms the others.