Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A systematic comparison of various statistical alignment models
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A geometric view on bilingual lexicon extraction from comparable corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A bilingual dictionary extracted from the Wikipedia link structure
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Overview of the WiQA task at CLEF 2006
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
MENTA: inducing multilingual taxonomies from wikipedia
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Linking topics of news and blogs with wikipedia for complementary navigation
BlogTalk'08/09 Proceedings of the 2008/2009 international conference on Social software: recent trends and developments in social software
Automatic gazetteer generation from wikipedia
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Multilingual schema matching for Wikipedia infoboxes
Proceedings of the VLDB Endowment
Cross-lingual knowledge linking across wiki knowledge bases
Proceedings of the 21st international conference on World Wide Web
Omnipedia: bridging the wikipedia language gap
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Boosting cross-lingual knowledge linking via concept annotation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
We present a novel method for discovering missing cross-language links between English and Japanese Wikipedia articles. We collect candidates of missing cross-language links -- a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then we select the correct cross-language links among the candidates by using a classifier trained with various types of features. Our method has three desirable characteristics for discovering missing links. First, our method can discover cross-language links with high accuracy (92\% precision with 78\% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, we generate the features based on resources automatically obtained from Wikipedia. In this work, we discover approximately $10^5$ missing cross-language links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.