Prediction by categorical features: generalization properties and application to feature ranking

Authors:
Sivan Sabato;Shai Shalev-Shwartz
Affiliations:
IBM Research Laboratory in Haifa, Haifa, Israel;IBM Research Laboratory in Haifa, Haifa, Israel and School of Computer Sci. & Eng., The Hebrew University, Jerusalem, Israel
Venue:
COLT'07 Proceedings of the 20th annual conference on Learning theory
Year:
2007

Citing 9
Cited 2

A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Lower Bounds for Bayes Error Estimation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning

Machine Learning
Convergence properties of functional estimates for discrete distributions

Random Structures & Algorithms - Special issue on analysis of algorithms dedicated to Don Knuth on the occasion of his (100)8th birthday
An Empirical Comparison of Selection Measures for Decision-Tree Induction

Machine Learning
On the Convergence Rate of Good-Turing Estimators

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Concentration Bounds for Unigram Language Models

The Journal of Machine Learning Research

Multi-classification by categorical features via clustering

Proceedings of the 25th international conference on Machine learning
Smarter log analysis

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe and analyze a new approach for feature ranking in the presence of categorical features with a large number of possible values. It shown that popular ranking criteria, such as the Gini index and the misclassification error, can be interpreted as the training error of a predictor that is deduced from the training set. It is then argued that using the generalization error is a more adequate ranking criterion.We propose a modification of the Gini index criterion, based on a robust estimation of the generalization error of a predictor associated with the Gini index. The properties of this new estimator are analyzed, showing that for most training sets, it produces an accurate estimation of the true generalization error. We then address the question of finding the optimal predictor that is based on a single categorical feature. It is shown that the predictor associated with the misclassification error criterion has the minimal expected generalization error. We bound the bias of this predictor with respect to the generalization error of the Bayes optimal predictor, and analyze its concentration properties.