Avoidance of model re-induction in SVM-based feature selection for text categorization

Authors:
Aleksander Kołcz;Abdur Chowdhury
Affiliations:
Microsoft Research and Live Labs;Illinois Institute of Technology
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 16
Cited 0

Support-Vector Networks

Machine Learning
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Feature selection using linear classifier weights: interaction with classification models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A theoretical characterization of linear SVM-based feature selection

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Analysis of recursive feature elimination methods

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Stability of Feature Selection Algorithms

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Stability of Feature Selection Algorithms

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching the feature space for a subset yielding optimum performance tends to be expensive, especially in applications where the cardinality of the feature space is high (e.g., text categorization). This is particularly true for massive datasets and learning algorithms with worse than linear scaling factors. Linear Support Vector Machines (SVMs) are among the top performers in the text classification domain and often work best with very rich feature representations. Even they however benefit from reducing the number of features, sometimes to a large extent. In this work we propose alternatives to exact re-induction of SVM models during the search for the optimum feature subset. The approximations offer substantial benefits in terms of computational efficiency. We are able to demonstrate that no significant compromises in terms of model quality are made and, moreover, in some cases gains in accuracy can be achieved.