Identifying multilingual Wikipedia articles based on cross language similarity and activity

  • Authors:
  • Khoi-Nguyen Tran;Peter Christen

  • Affiliations:
  • The Australian National University, Canberra, Australia;The Australian National University, Canberra, Australia

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Wikipedia is an online free and open access encyclopedia available in many languages. Wikipedia articles across over 280 languages are written by millions of editors. However, the growth of articles and their content is slowing, especially within the largest Wikipedia language: English. The stabilization of articles presents opportunities for multilingual Wikipedia editors to apply their translation skills to add articles and content to smaller Wikipedia languages. In this poster, we propose similarity and activity measures of Wikipedia articles across two languages: English and German. These measures allow us to evaluate the distribution of articles based on their knowledge coverage and their activity across languages. We show the state of Wikipedia articles as of June 2012 and discuss how these measures allow us to develop recommendation and verification models for multilingual editors to enrich articles and content in Wikipedia languages with relatively smaller knowledge coverage.