The JIGSAW Algorithm for Word Sense Disambiguation and Semantic Indexing of Documents

  • Authors:
  • P. Basile;M. Degemmis;A. L. Gentile;P. Lops;G. Semeraro

  • Affiliations:
  • Dipartimento di Informatica, Università di Bari, Via E. Orabona, 4 - 70125 Bari, Italia;Dipartimento di Informatica, Università di Bari, Via E. Orabona, 4 - 70125 Bari, Italia;Dipartimento di Informatica, Università di Bari, Via E. Orabona, 4 - 70125 Bari, Italia;Dipartimento di Informatica, Università di Bari, Via E. Orabona, 4 - 70125 Bari, Italia;Dipartimento di Informatica, Università di Bari, Via E. Orabona, 4 - 70125 Bari, Italia

  • Venue:
  • AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word Sense Disambiguation (WSD) is traditionally considered an AI-hard problem. In fact, a breakthrough in this field would have a significant impact on many relevant fields, such as information retrieval and information extraction. This paper describes JIGSAW, a knowledge-based WSD algorithm that attemps to disambiguate all words in a text by exploiting WordNet senses. The main assumption is that a Part-Of-Speech (POS)-dependent strategy to WSD can turn out to be more effective than a unique strategy. Semantics provided by WSD gives an added value to applications centred on humans as users. Two empirical evaluations are described in the paper. First, we evaluated the accuracy of JIGSAW on Task 1 of SEMEVAL-1 competition. This task measures the effectiveness of a WSD algorithm in an Information Retrieval System. For the second evaluation, we used semantically indexed documents obtained through a WSD process in order to train a naïve Bayes learner that infers "semantic" sense-baseduser profiles as binary text classifiers. The goal of the second empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended within a document collection.