Wikipedia-based WSD for multilingual frame annotation

  • Authors:
  • Sara Tonelli;Claudio Giuliano;Kateryna Tymoshenko

  • Affiliations:
  • Fondazione Bruno Kessler, via Sommarive 18, I-38100 Trento, Italy;Fondazione Bruno Kessler, via Sommarive 18, I-38100 Trento, Italy;Fondazione Bruno Kessler, via Sommarive 18, I-38100 Trento, Italy

  • Venue:
  • Artificial Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many applications in the context of natural language processing have been proven to achieve a significant performance when exploiting semantic information extracted from high-quality annotated resources. However, the practical use of such resources is often biased by their limited coverage. Furthermore, they are generally available only for English and few other languages. We propose a novel methodology that, starting from the mapping between FrameNet lexical units and Wikipedia pages, automatically leverages from Wikipedia new lexical units and example sentences. The goal is to build a reference data set for the semi-automatic development of new FrameNets. In addition, this methodology can be adapted to perform frame identification in any language available in Wikipedia. Our approach relies on a state-of-the-art word sense disambiguation system that is first trained on English Wikipedia to assign a page to the lexical units in a frame. Then, this mapping is further exploited to perform frame identification in English or in any other language available in Wikipedia. Our approach shows a high potential in multilingual settings, because it can be applied to languages for which other lexical resources such as WordNet or thesauri are not available.