Ontology Learning and Its Application to Automated Terminology Translation
IEEE Intelligent Systems
Using Bilingual Web Data to Mine and Rank Translations
IEEE Intelligent Systems
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Base Noun Phrase translation using web data and the EM algorithm
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using the web as a bilingual dictionary
DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
Chinese-English term translation mining based on semantic prediction
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Hi-index | 0.00 |
Mining terminology translation from a large amount of Web data can be applied in many fields such as reading/writing assistant, machine translation and cross-language information retrieval. How to find more comprehensive results from the Web and obtain the boundary of candidate translations, and how to remove irrelevant noises and rank the remained candidates are the challenging issues. In this paper, after reviewing and analyzing all possible methods of acquiring translations, a feasible statistics-based method is proposed to mine terminology translation from the Web. In the proposed method, on the basis of an analysis of different forms of term translation distributions, character-based string frequency estimation is presented to construct term translation candidates for exploring more translations and their boundaries, and then sort-based subset deletion and mutual information methods are respectively proposed to deal with subset redundancy information and prefix/suffix redundancy information formed in the process of estimation. Extensive experiments on two test sets of 401 and 3511 English terms validate that our system has better performance.