Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia

Authors:
Jong-Hoon Oh;Daisuke Kawahara;Kiyotaka Uchimoto;Jun'ichi Kazama;Kentaro Torisawa
Affiliations:
-;-;-;-;-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2008

Citing 7
Cited 8

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A geometric view on bilingual lexicon extraction from comparable corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A bilingual dictionary extracted from the Wikipedia link structure

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Overview of the WiQA task at CLEF 2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
MENTA: inducing multilingual taxonomies from wikipedia

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Linking topics of news and blogs with wikipedia for complementary navigation

BlogTalk'08/09 Proceedings of the 2008/2009 international conference on Social software: recent trends and developments in social software
Automatic gazetteer generation from wikipedia

NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Multilingual schema matching for Wikipedia infoboxes

Proceedings of the VLDB Endowment
Cross-lingual knowledge linking across wiki knowledge bases

Proceedings of the 21st international conference on World Wide Web
Omnipedia: bridging the wikipedia language gap

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Boosting cross-lingual knowledge linking via concept annotation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel method for discovering missing cross-language links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links -- a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92\% precision with 78\% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately $10^5$ missing cross-language links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.