Feature selection: a useful preprocessing step

Authors:
Isabelle Moulinier
Affiliations:
LIP6, Université P. et M. Curie, Paris, France
Venue:
IRSG'97 Proceedings of the 19th Annual BCS-IRSG conference on Information Retrieval Research
Year:
1997

Citing 12
Cited 2

Representation and learning in information retrieval

Representation and learning in information retrieval
Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning trees and rules with set-valued features

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Automatic categorization of case law

Proceedings of the 8th international conference on Artificial intelligence and law
A New Fuzzy Hierarchical Classification Based on SVM for Text Categorization

ICIAR '09 Proceedings of the 6th International Conference on Image Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical classification techniques and machine learning methods have been applied to some Information Retrieval (IR) problems: routing, filtering and categorization. Most of these methods are usually awkward and sometimes intractable in highly dimensional feature spaces. In order to reduce dimensionality, feature selection has been introduced as a pre-processing step. In this paper, we assess to what extent feature selection can be used without causing a loss in effectiveness. This problem can be tackled since a couple of recent learners do not require a preprocessing step. On a text categorization task, using the Reuters-22,173 collection, we give empirical evidence that feature selection is useful: first, the size of the collection index can be drastically reduced without causing a significant loss in categorization effectiveness. Then, we show that feature selection speeds up the time required to automatically build the categorization system.