Ranking Categorical Features Using Generalization Properties

Authors:
Sivan Sabato;Shai Shalev-Shwartz
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2008

Citing 10
Cited 1

A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Lower Bounds for Bayes Error Estimation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning

Machine Learning
Convergence properties of functional estimates for discrete distributions

Random Structures & Algorithms - Special issue on analysis of algorithms dedicated to Don Knuth on the occasion of his (100)8th birthday
An Empirical Comparison of Selection Measures for Decision-Tree Induction

Machine Learning
On the Convergence Rate of Good-Turing Estimators

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Concentration Bounds for Unigram Language Models

The Journal of Machine Learning Research
All of Statistics: A Concise Course in Statistical Inference

All of Statistics: A Concise Course in Statistical Inference

Probabilistic self-organizing maps for qualitative data

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature ranking is a fundamental machine learning task with various applications, including feature selection and decision tree learning. We describe and analyze a new feature ranking method that supports categorical features with a large number of possible values. We show that existing ranking criteria rank a feature according to the training error of a predictor based on the feature. This approach can fail when ranking categorical features with many values. We propose the Ginger ranking criterion, that estimates the generalization error of the predictor associated with the Gini index. We show that for almost all training sets, the Ginger criterion produces an accurate estimation of the true generalization error, regardless of the number of values in a categorical feature. We also address the question of finding the optimal predictor that is based on a single categorical feature. It is shown that the predictor associated with the misclassification error criterion has the minimal expected generalization error. We bound the bias of this predictor with respect to the generalization error of the Bayes optimal predictor, and analyze its concentration properties. We demonstrate the efficiency of our approach for feature selection and for learning decision trees in a series of experiments with synthetic and natural data sets.