Representation and learning in information retrieval
Representation and learning in information retrieval
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning trees and rules with set-valued features
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Automatic categorization of case law
Proceedings of the 8th international conference on Artificial intelligence and law
A New Fuzzy Hierarchical Classification Based on SVM for Text Categorization
ICIAR '09 Proceedings of the 6th International Conference on Image Analysis and Recognition
Hi-index | 0.00 |
Statistical classification techniques and machine learning methods have been applied to some Information Retrieval (IR) problems: routing, filtering and categorization. Most of these methods are usually awkward and sometimes intractable in highly dimensional feature spaces. In order to reduce dimensionality, feature selection has been introduced as a pre-processing step. In this paper, we assess to what extent feature selection can be used without causing a loss in effectiveness. This problem can be tackled since a couple of recent learners do not require a preprocessing step. On a text categorization task, using the Reuters-22,173 collection, we give empirical evidence that feature selection is useful: first, the size of the collection index can be drastically reduced without causing a significant loss in categorization effectiveness. Then, we show that feature selection speeds up the time required to automatically build the categorization system.