Improving the extraction of bilingual terminology from Wikipedia

Authors:
Maike Erdmann;Kotaro Nakayama;Takahiro Hara;Shojiro Nishio
Affiliations:
Osaka University;The University of Tokyo;Osaka University;Osaka University
Venue:
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Year:
2009

Citing 12
Cited 8

A statistical approach to machine translation

Computational Linguistics
Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A systematic comparison of various statistical alignment models

Computational Linguistics
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Automating the acquisition of bilingual terminology

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A word-to-word model of translational equivalence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Information arbitrage across multi-lingual Wikipedia

Proceedings of the Second ACM International Conference on Web Search and Data Mining
An approach for extracting bilingual terminology from Wikipedia

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications

Cross-language retrieval using link-based language models

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Automatic domain terminology extraction using graph mutual reinforcement

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Language-independent context aware query translation using Wikipedia

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Cross-lingual knowledge linking across wiki knowledge bases

Proceedings of the 21st international conference on World Wide Web
Towards building a multilingual semantic network: identifying interlingual links in Wikipedia

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Analysis and refinement of cross-lingual entity linking

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Chinese terminology extraction using EM-Based transfer learning method

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A new data hiding method via revision history records on collaborative writing platforms

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.