Discovering User Profiles from Semantically Indexed Scientific Papers

  • Authors:
  • Giovanni Semeraro;Pierpaolo Basile;Marco Gemmis;Pasquale Lops

  • Affiliations:
  • Department of Informatics, University of Bari, Via E. Orabona, 4 - 70125 Bari, Italia;Department of Informatics, University of Bari, Via E. Orabona, 4 - 70125 Bari, Italia;Department of Informatics, University of Bari, Via E. Orabona, 4 - 70125 Bari, Italia;Department of Informatics, University of Bari, Via E. Orabona, 4 - 70125 Bari, Italia

  • Venue:
  • From Web to Social Web: Discovering and Deploying User and Content Profiles
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings (senses) of a polysemous word, the correct one. Semantically indexed documents are used to train a naïve Bayes learner that infers "semantic", sense-baseduser profiles as binary text classifiers (user-likes and user-dislikes).Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the Senseval-3initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.