Building Bilingual Dictionaries from Parallel Web Documents
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Anchor text mining for translation of Web queries: A transitive translation approach
ACM Transactions on Information Systems (TOIS)
Translating unknown queries with web corpora for cross-language information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web as a bilingual dictionary
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
Bilingual web pages contain abundant term translation knowledge which is crucial for query translation in Cross Language Information Retrieval systems. But it is a challenging task to extract term translations from bilingual web pages due to the variation in web page layouts and writing styles. In this paper, based on the observation that translation pairs on the same web page tend to appear following similar patterns, a new extraction model is proposed to adaptively learn extraction patterns and exploit them to facilitate term translation mining from bilingual web pages. Experiments reflect that this model can significantly improve extraction coverage while maintaining high accuracy. It improves query translation in cross-language information retrieval, leading to significantly higher retrieval effectiveness on TREC collections.