Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method

  • Authors:
  • Shereen Albitar;Sebastien Fournier;Bernard Espinasse

  • Affiliations:
  • -;-;-

  • Venue:
  • WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of this paper is to propose a supervised text classification method for the biomedical domain using semantic resources. We choose the traditional text classification method, Rocchio, for its scalability and extendibility with semantic knowledge. This paper proposes to integrate semantic aspects into Rocchio through a conceptualization task. This conceptualization is realized by mapping terms that are extracted from text to their corresponding concepts in the UMLS® Metathesaurus® in order to take meaning into consideration during text classification. The proposed classifier is tested on the Ohsumed text corpus, which is composed of abstracts of biomedical articles retrieved from the MEDLINE® database. The effects of Conceptualization on Rocchio's performance are discussed according to different standard similarity measures and to a variety of conceptualization strategies.