Feature selection: a useful preprocessing step

  • Authors:
  • Isabelle Moulinier

  • Affiliations:
  • LIP6, Université P. et M. Curie, Paris, France

  • Venue:
  • IRSG'97 Proceedings of the 19th Annual BCS-IRSG conference on Information Retrieval Research
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical classification techniques and machine learning methods have been applied to some Information Retrieval (IR) problems: routing, filtering and categorization. Most of these methods are usually awkward and sometimes intractable in highly dimensional feature spaces. In order to reduce dimensionality, feature selection has been introduced as a pre-processing step. In this paper, we assess to what extent feature selection can be used without causing a loss in effectiveness. This problem can be tackled since a couple of recent learners do not require a preprocessing step. On a text categorization task, using the Reuters-22,173 collection, we give empirical evidence that feature selection is useful: first, the size of the collection index can be drastically reduced without causing a significant loss in categorization effectiveness. Then, we show that feature selection speeds up the time required to automatically build the categorization system.