Transforming Wikipedia into a large scale multilingual concept network

  • Authors:
  • Vivi Nastase;Michael Strube

  • Affiliations:
  • HITS gGmbH, Heidelberg, Germany;HITS gGmbH, Heidelberg, Germany

  • Venue:
  • Artificial Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A knowledge base for real-world language processing applications should consist of a large base of facts and reasoning mechanisms that combine them to induce novel and more complex information. This paper describes an approach to deriving such a large scale and multilingual resource by exploiting several facets of the on-line encyclopedia Wikipedia. We show how we can build upon Wikipedia@?s existing network of categories and articles to automatically discover new relations and their instances. Working on top of this network allows for added information to influence the network and be propagated throughout it using inference mechanisms that connect different pieces of existing knowledge. We then exploit this gained information to discover new relations that refine some of those found in the previous step. The result is a network containing approximately 3.7 million concepts with lexicalizations in numerous languages and 49+ million relation instances. Intrinsic and extrinsic evaluations show that this is a high quality resource and beneficial to various NLP tasks.