Adaptive web mining of bilingual lexicons for cross language information retrieval

Authors:
Lei Shi
Affiliations:
Yahoo! Software R&D Beijing, Beijing, China
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 8
Cited 0

Building Bilingual Dictionaries from Parallel Web Documents

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web as a bilingual dictionary

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A DOM tree alignment model for mining parallel data from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bilingual web pages contain abundant term translation knowledge which is crucial for query translation in Cross Language Information Retrieval systems. But it is a challenging task to extract term translations from bilingual web pages due to the variation in web page layouts and writing styles. In this paper, based on the observation that translation pairs on the same web page tend to appear following similar patterns, a new extraction model is proposed to adaptively learn extraction patterns and exploit them to facilitate term translation mining from bilingual web pages. Experiments reflect that this model can significantly improve extraction coverage while maintaining high accuracy. It improves query translation in cross-language information retrieval, leading to significantly higher retrieval effectiveness on TREC collections.