Web-based terminology translation mining

Authors:
Gaolin Fang;Hao Yu;Fumihito Nishino
Affiliations:
Fujitsu Research and Development Center, Co., LTD., Beijing, China;Fujitsu Research and Development Center, Co., LTD., Beijing, China;Fujitsu Research and Development Center, Co., LTD., Beijing, China
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 8
Cited 1

Ontology Learning and Its Application to Automated Terminology Translation

IEEE Intelligent Systems
Using Bilingual Web Data to Mine and Rank Translations

IEEE Intelligent Systems
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Base Noun Phrase translation using web data and the EM algorithm

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using the web as a bilingual dictionary

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14

Chinese-English term translation mining based on semantic prediction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining terminology translation from a large amount of Web data can be applied in many fields such as reading/writing assistant, machine translation and cross-language information retrieval. How to find more comprehensive results from the Web and obtain the boundary of candidate translations, and how to remove irrelevant noises and rank the remained candidates are the challenging issues. In this paper, after reviewing and analyzing all possible methods of acquiring translations, a feasible statistics-based method is proposed to mine terminology translation from the Web. In the proposed method, on the basis of an analysis of different forms of term translation distributions, character-based string frequency estimation is presented to construct term translation candidates for exploring more translations and their boundaries, and then sort-based subset deletion and mutual information methods are respectively proposed to deal with subset redundancy information and prefix/suffix redundancy information formed in the process of estimation. Extensive experiments on two test sets of 401 and 3511 English terms validate that our system has better performance.