Scoring and Selecting Terms for Text Categorization

  • Authors:
  • Elena Montanes;Irene Diaz;Jose Ranilla;Elias F. Combarro;Javier Fernandez

  • Affiliations:
  • University of Oviedo;University of Oviedo;University of Oviedo;University of Oviedo;University of Oviedo

  • Venue:
  • IEEE Intelligent Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Machine learning has become one of the main approaches to tackling text categorization. Because text domains present much irrelevant information, effective feature reduction is essential to improve classifiers' effectiveness and efficiency. A set of new scoring measures for feature selection taken from the machine learning domain were evaluated over two well-known collections of documents. Some of these measures outperformed traditional measures from information retrieval and information theory in certain situations.