The impact of conceptualization on text classification

  • Authors:
  • Shereen Albitar;Sébastien Fournier;Bernard Espinasse

  • Affiliations:
  • LSIS, Université d'Aix marseille, Marseille, France;LSIS, Université d'Aix marseille, Marseille, France;LSIS, Université d'Aix marseille, Marseille, France

  • Venue:
  • WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Aiming at more efficient search on the Internet, it seems adequate to deploy classification techniques using semantic resources restricting this search to the user's domain of interest. In this work, we try to assess the impact of integrating semantic knowledge on text classification. This integration can be realized in different ways. The one we choose in this paper is the conceptualization. We examine the impact of the different conceptualization strategies on text classification using three traditional text classification methods: Rocchio, Support Vector Machines (SVMs) and Naïve Bayes (NB). We restrain our experimentation to the biomedical domain so conceptualization is applied on OHSUMED corpus, mapping terms in text to their corresponding concepts in UMLS Metathesaurus in order to take their meaning into consideration during text classification. Rocchio, SVMs, and NB are tested using different conceptualization strategies in order to evaluate their effect on classification. Preliminary results demonstrate promising improvements.