Mining OOV translations from mixed-language web pages for cross language information retrieval

Authors:
Lei Shi
Affiliations:
Yahoo Software RSD (Beijing), Bejing, China
Venue:
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Year:
2010

Citing 15
Cited 2

A maximum entropy approach to natural language processing

Computational Linguistics
Building Bilingual Dictionaries from Parallel Web Documents

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Automatically creating bilingual lexicons for Machine Translation from bilingual text

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web for automated translation extraction in cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Detection and translation of OOV terms prior to query time

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web as a bilingual dictionary

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A DOM tree alignment model for mining parallel data from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mining key phrase translations from web corpora

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning source-target surface patterns for web-based terminology translation

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Hidden Conditional Random Fields

IEEE Transactions on Pattern Analysis and Machine Intelligence

Recent developments in information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Translating Out-Of-Vocabulary (OOV) terms is crucial for Cross Language Information Retrieval (CLIR). In this paper, we propose a method that automatically acquires a large quantity of OOV translations from the web. Different from previous approaches that rely on a finite set of hand-crafted extraction rules, our method adaptively learns translation extraction patterns based on the observation that translation pairs on the same page tend to appear following similar layout patterns. The learned patterns are leveraged in a discriminative translation extraction model that treats translation extraction from a mixed language bilingual web page as a sequence labeling task in order to exploit useful relations among translation pairs on the page. Experiments demonstrate that our proposed method out-performs earlier work with marked improvement on OOV translation mining quality.