Concept unification of terms in different languages via web mining for Information Retrieval

Authors:
Qing Li;Yuanzhu Peter Chen;Sung-Hyon Myaeng;Yun Jin;Bo-Yeong Kang
Affiliations:
Southwestern University of Finance and Economics, China;Memorial University of Newfoundland, St. John's, Canada;Information and Communications University, Daejeon, Republic of Korea;Information and Communications University, Daejeon, Republic of Korea;Seoul National University, Seoul, Republic of Korea
Venue:
Information Processing and Management: an International Journal
Year:
2009

Citing 12
Cited 1

Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Using n-grams for Korean text retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Effective foreign word extration for Korean information retrieval

Information Processing and Management: an International Journal
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web for automated translation extraction in cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using mutual information to resolve query translation ambiguities and query term weighting

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Mining the Web for bilingual text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Mining translations of OOV terms from the web through cross-lingual query expansion

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web as a bilingual dictionary

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14

Learning inter-related statistical query translation models for English-Chinese bi-directional CLIR

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

For historical and cultural reasons, English phases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.