Synonyms extraction using web content focused crawling

Authors:
Chien-Hsing Chen;Chung-Chian Hsu
Affiliations:
National Yunlin University of Science and Technology, Taiwan;National Yunlin University of Science and Technology, Taiwan
Venue:
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Year:
2008

Citing 6
Cited 0

Data mining: concepts and techniques

Data mining: concepts and techniques
On the use of words and n-grams for Chinese information retrieval

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Domain-Specific Web Search with Keyword Spices

IEEE Transactions on Knowledge and Data Engineering
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring similarity between transliterations against noise data

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Documents or Web pages collected from the World Wide Web have been considered one of the most important sources for information. Using search engines to retrieve the documents can harvest lots of information, facilitating information exchange and knowledge sharing, including foreign information. However, to better understand by local readers, foreign words, like English, are often translated to local language such as Chinese. Due to different translators and the lack of translation standard, translating foreign words may pose a notorious headache and result in different transliterations, particularly in proper nouns like person names and geographical names. For example, "Bin Laden" is translated into terms "???"(binladeng) or "???"(benladeng). Both are valid synonymous transliterations. In this research, we propose an approach to determining synonymous transliterations via mining Web pages retrieved by a search engine. Experiments show that the proposed approach can effectively extract synonymous transliterations given an input transliteration.