A maximum entropy approach to natural language processing
Computational Linguistics
Building Bilingual Dictionaries from Parallel Web Documents
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Automatically creating bilingual lexicons for Machine Translation from bilingual text
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Anchor text mining for translation of Web queries: A transitive translation approach
ACM Transactions on Information Systems (TOIS)
Translating unknown queries with web corpora for cross-language information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web for automated translation extraction in cross-language information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Detection and translation of OOV terms prior to query time
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web as a bilingual dictionary
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mining key phrase translations from web corpora
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning source-target surface patterns for web-based terminology translation
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Hidden Conditional Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Recent developments in information retrieval
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Translating Out-Of-Vocabulary (OOV) terms is crucial for Cross Language Information Retrieval (CLIR). In this paper, we propose a method that automatically acquires a large quantity of OOV translations from the web. Different from previous approaches that rely on a finite set of hand-crafted extraction rules, our method adaptively learns translation extraction patterns based on the observation that translation pairs on the same page tend to appear following similar layout patterns. The learned patterns are leveraged in a discriminative translation extraction model that treats translation extraction from a mixed language bilingual web page as a sequence labeling task in order to exploit useful relations among translation pairs on the page. Experiments demonstrate that our proposed method out-performs earlier work with marked improvement on OOV translation mining quality.