Semi-analytical method for analyzing models and model selection measures based on moment analysis

Authors:
Amit Dhurandhar;Alin Dobra
Affiliations:
University of Florida, Gainesville, FL;University of Florida, Gainesville, FL
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2009

Citing 9
Cited 0

Note on free lunches and cross-validation

Neural Computation
Algorithmic stability and sanity-check bounds for leave-one-out cross-validation

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Beating the hold-out: bounds for K-fold and progressive cross-validation

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Tractable Average-Case Analysis of Naive Bayesian Classifiers

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Convex Optimization

Convex Optimization
No Unbiased Estimator of the Variance of K-Fold Cross-Validation

The Journal of Machine Learning Research
No free lunch for cross-validation

Neural Computation
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Almost-everywhere algorithmic stability and generalization error

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we propose a moment-based method for studying models and model selection measures. By focusing on the probabilistic space of classifiers induced by the classification algorithm rather than on that of datasets, we obtain efficient characterizations for computing the moments, which is followed by visualization of the resulting formulae that are too complicated for direct interpretation. By assuming the data to be drawn independently and identically distributed from the underlying probability distribution, and by going over the space of all possible datasets, we establish general relationships between the generalization error, hold-out-set error, cross-validation error, and leave-one-out error. We later exemplify the method and the results by studying the behavior of the errors for the naive Bayes classifier.