Mining Synonymous Transliterations from the World Wide Web

Authors:
Chung-Chian Hsu;Chien-Hsing Chen
Affiliations:
National Yunlin University of Science and Technology;National Yunlin University of Science and Technology
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2010

Citing 22
Cited 2

The String-to-String Correction Problem

Journal of the ACM (JACM)
Translation of web queries using anchor text mining

ACM Transactions on Asian Language Information Processing (TALIP)
Using Bilingual Web Data to Mine and Rank Translations

IEEE Intelligent Systems
Machine transliteration

Computational Linguistics
Automatic English-Chinese name transliteration for development of multilingual resources

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Proper name translation in cross-language information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web for automated translation extraction in cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mining translations of OOV terms from the web through cross-lingual query expansion

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Translating–transliterating named entities for multilingual information access

Journal of the American Society for Information Science and Technology
Multitype Features Coselection for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
An ensemble of transliteration models for information retrieval

Information Processing and Management: an International Journal
Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

ACM Transactions on Asian Language Information Processing (TALIP)
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Measuring similarity between transliterations against noise data

ACM Transactions on Asian Language Information Processing (TALIP)
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Chinese-English term translation mining based on semantic prediction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Named entity translation with web mining and transliteration

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Comparison of ensemble classifiers in extracting synonymous Chinese transliteration pairs from web

ICSI'11 Proceedings of the Second international conference on Advances in swarm intelligence - Volume Part II
Dealing with orthographic variation in a tagger-lemmatizer for fourteenth century Dutch charters

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The World Wide Web has been considered one of the important sources for information. Using search engines to retrieve Web pages can gather lots of information, including foreign information. However, to be better understood by local readers, proper names in a foreign language, such as English, are often transliterated to a local language such as Chinese. Due to different translators and the lack of translation standard, translating foreign proper nouns may result in different transliterations and pose a notorious headache. In particular, it may cause incomplete search results. Using one transliteration as a query keyword will fail to retrieve the Web pages which use a different word as the transliteration. Consequently, important information may be missed. We present a framework for mining synonymous transliterations as many as possible from the Web for a given transliteration. The results can be used to construct a database of synonymous transliterations which can be utilized for query expansion so as to alleviate the incomplete search problem. Experimental results show that the proposed framework can effectively retrieve the set of snippets which may contain synonymous transliterations and then extract the target terms. Most of the extracted synonymous transliterations have higher rank of similarity to the input transliteration compared to other noise terms.