An effective sampling method for decision trees considering comprehensibility and accuracy

Authors:
Hyontai Sug
Affiliations:
Division of Computer and Information Engineering, Dongseo University, Busan, Republic of Korea
Venue:
WSEAS Transactions on Computers
Year:
2009

Citing 13
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Exploring the decision forest: an empirical investigation of Occam's razor in decision tree induction

Computational learning theory and natural learning systems: Volume IV
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

Data Mining and Knowledge Discovery
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Comparitive analysis of fuzzy decision tree and logistic regression methods for pavement treatment prediction

WSEAS Transactions on Information Science and Applications
Application of decision trees in problem of air quality modelling in the Czech Republic locality

WSEAS TRANSACTIONS on SYSTEMS
Implementation of classifiers for choosing insurance policy using decision trees: a case study

WSEAS Transactions on Computers
Occam's Razor and a non-syntactic measure of decision tree complexity

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because the target domain of data mining using decision trees usually contains a lot of data, sampling is needed. But selecting proper samples for a given decision tree algorithm is not easy, because each decision tree algorithm has its own property in generating trees and selecting appropriate samples that represent given target data set well is difficult. As the size of samples grows, the size of generated decision trees grows with some improvement in error rates. But we cannot use larger and larger samples, because it's not easy to understand large decision trees and data overfitting problem can happen. This paper suggests a progressive approach in determining a proper sample size to generate good decision trees with respect to generated tree size and accuracy. Experiments with two representative decision tree algorithms, CART and C4.5 show very promising results.