Word Sense Disambiguation Using Inductive Logic Programming

  • Authors:
  • Lucia Specia;Ashwin Srinivasan;Ganesh Ramakrishnan;Maria Das Volpe Nunes

  • Affiliations:
  • ICMC - University of São Paulo, Trabalhador São-Carlense, 400, São Carlos, 13560-970, Brazil;IBM India Research Laboratory, Block 1, Indian Institute of Technology, New Delhi 110016, India and Dept. of Computer Science and Engineering & Centre for Health Informatics, University of New Sou ...;IBM India Research Laboratory, Block 1, Indian Institute of Technology, New Delhi 110016, India;ICMC - University of São Paulo, Trabalhador São-Carlense, 400, São Carlos, 13560-970, Brazil

  • Venue:
  • Inductive Logic Programming
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

The identification of the correct sense of a word is necessary for many tasks in automatic natural language processing like machine translation, information retrieval, speech and text processing. Automatic Word Sense Disambiguation (WSD) is difficult and accuracies with state-of-the art methods are substantially lower than in other areas of text understanding like part-of-speech tagging. One shortcoming of these methods is that they do not utilize substantial sources of background knowledge, such as semantic taxonomies and dictionaries, which are now available in electronic form (the methods largely use shallow syntactic features). Empirical results from the use of Inductive Logic Programming (ILP) have repeatedly shown the ability of ILP systems to use diverse sources of background knowledge. In this paper we investigate the use of ILP for WSD in two different ways: (a) as a stand-alone constructor of models for WSD; and (b) to build interesting features, which can then be used by standard model-builders such as SVM. In our experiments we examine a monolingual WSD task using the 32 English verbs contained in the SENSEVAL-3 benchmark data; and a bilingual WSD task using 7 highly ambiguous verbs in machine translation from English to Portuguese. Background knowledge available is from eight sources that provide a wide range of syntactic and semantic information. For both WSD tasks, experimental results show that ILP-constructed models and models built using ILP-generated features have higher accuracies than those obtained using a state-of-the art feature-based technique equipped with shallow syntactic features. This suggests that the use of ILP with diverse sources of background knowledge can provide one way for making substantial progress in the field of automatic WSD.