Does a new simple Gaussian weighting approach perform well in text categorization?

  • Authors:
  • Giorgio Maria Di Nunzio;Alessandro Micarelli

  • Affiliations:
  • Dip. di Ingegneria dell'Informazione, Universita degli Studi di Padova, Padova, Italia;Dip. di Informatica e Automazione, Universita degli Studi "Roma Tre", Roma, Italia

  • Venue:
  • IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new approach to the Text Categorization problem is here presented. It is called Gaussian Weighting and it is a supervised learning algorithm that, during the training phase, estimates two very simple and easily computable statistics which are: the Presence P, how much a term is present in a category c in the Expressiveness E, how much is present outside c in the rest of the domain. Once the system has learned this information, a Gaussian function is shaped for each term of a category, in order to assign the term a weight that estimates the level of its importance for that particular category. We tested our learning method on the task of single-label classification using the Reuters-21578 benchmark. The outcome of the result was quite impressive: in different experimental setups, we reached a micro-averaged Fl-measure of 0.89, with a peak of 0.899. Moreover, a macro-averaged Recall and Precision was calculated: the former reported a 0.72, the latter a 0.79. These results reach most of the state-of-the-art techniques of machine learning applied to Text Categorization, demonstrating that this new weighting scheme does perform well on this particular task.