Occam's razor just got sharper

Authors:
Saher Esmeir;Shaul Markovitch
Affiliations:
Computer Science Department, Technion-Israel Institute of Technology, Haifa, Israel;Computer Science Department, Technion-Israel Institute of Technology, Haifa, Israel
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 7
Cited 2

Occam's razor

Information Processing Letters
Inferring decision trees using the minimum description length principle

Information and Computation
C4.5: programs for machine learning

C4.5: programs for machine learning
The Role of Occam‘s Razor in Knowledge Discovery

Data Mining and Knowledge Discovery
What Should be Minimized in a Decision Tre: A Re-examination

What Should be Minimized in a Decision Tre: A Re-examination
Lookahead-based algorithms for anytime induction of decision trees

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Further experimental evidence against the utility of Occam's razor

Journal of Artificial Intelligence Research

Anytime Learning of Decision Trees

The Journal of Machine Learning Research
Anytime induction of low-cost, low-error classifiers: a sampling-based approach

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Occam's razor is the principle that, given two hypotheses consistent with the observed data, the simpler one should be preferred. Many machine learning algorithms follow this principle and search for a small hypothesis within the version space. The principle has been the subject of a heated debate with theoretical and empirical arguments both for and against it. Earlier empirical studies lacked sufficient coverage to resolve the debate. In this work we provide convincing empirical evidence for Occam's razor in the context of decision tree induction. By applying a variety of sophisticated sampling techniques, our methodology samples the version space for many real-world domains and tests the correlation between the size of a tree and its accuracy. We show that indeed a smaller tree is likely to be more accurate, and that this correlation is statistically significant across most domains.