Diversification for better classification trees

Authors:
Zhiwei Fu;Bruce L. Golden;Shreevardhan Lele;S. Raghavan;Edward Wasil
Affiliations:
Fannie Mae, Washington DC;R.H. Smith School of Business, University of Maryland, College Park, Maryland, MD;R.H. Smith School of Business, University of Maryland, College Park, Maryland, MD;R.H. Smith School of Business, University of Maryland, College Park, Maryland, MD;Kogod School of Business, American University, Washington DC
Venue:
Computers and Operations Research
Year:
2006

Citing 18
Cited 5

The GENITOR algorithm and selection pressure: why rank-based allocation of reproductive trials is best

Proceedings of the third international conference on Genetic algorithms
C4.5: programs for machine learning

C4.5: programs for machine learning
Genetic algorithms + data structures = evolution programs (3rd ed.)

Genetic algorithms + data structures = evolution programs (3rd ed.)
Bagging predictors

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Concept Formation and Decision Tree Induction Using the Genetic Programming Paradigm

PPSN I Proceedings of the 1st Workshop on Parallel Problem Solving from Nature
Application of Genetic Programming to Induction of Linear Classification Trees

Proceedings of the European Conference on Genetic Programming
Genetic Programming and Simulated Annealing: A Hybrid Method to Evolve Decision Trees

Proceedings of the European Conference on Genetic Programming
The Construction and Evaluation of Decision Trees: a Comparison of Evolutionary and Concept Learning Methods

Selected Papers from AISB Workshop on Evolutionary Computing
A Dynamic Programming Based Pruning Method for Decision Trees

INFORMS Journal on Computing
A survey of evolutionary algorithms for data mining and knowledge discovery

Advances in evolutionary computing
Using genetic algorithms to develop intelligent decision trees

Using genetic algorithms to develop intelligent decision trees
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees

INFORMS Journal on Computing
Genetically Engineered Decision Trees: Population Diversity Produces Smarter Trees

Operations Research
Inducing oblique decision trees with evolutionary algorithms

IEEE Transactions on Evolutionary Computation

An Optimal Constrained Pruning Strategy for Decision Trees

INFORMS Journal on Computing
Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining

Operations Research
Evolutionary model tree induction

Proceedings of the 2010 ACM Symposium on Applied Computing
Evolving decision trees with beam search-based initialization and lexicographic multi-objective evaluation

Information Sciences: an International Journal
An improved boosting based on feature selection for corporate bankruptcy prediction

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Classification trees are widely used in the data mining community. Typically, trees are constructed to try and maximize their mean classification accuracy. In this paper, we propose an alternative to using the mean accuracy as the performance measure of a tree. We investigate the use of various percentiles (representing the risk aversion of a decision maker) of the distribution of classification accuracy in place of the mean. We develop a genetic algorithm (GA) to build decision trees based on this new criterion. We develop this GA further by explicitly creating diversity in the population by simultaneously considering two fitness criteria within the GA. We show that our bicriterion GA performs quite well, scales up to handle large data sets, and requires a small sample of the original data to build a good decision tree.