A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Authors:
Tjen-Sien Lim;Wei-Yin Loh;Yu-Shan Shih
Affiliations:
Department of Statistics, University of Wisconsin, Madison, WI 53706, USA;Department of Statistics, University of Wisconsin, Madison, WI 53706, USA. loh@stat.wisc.edu;Department of Mathematics, National Chung Cheng University, Chiayi 621, Taiwan, R.O.C. yshih@math.ccu.edu.tw
Venue:
Machine Learning
Year:
2000

Citing 14
Cited 165

Applied multivariate statistical analysis

Applied multivariate statistical analysis
Symbolic and Neural Learning Algorithms: An Experimental Comparison

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Multivariate Decision Trees

Machine Learning
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Self-organizing maps

Self-organizing maps
SAS/ETS User's Guide, Version 6

SAS/ETS User's Guide, Version 6
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
The Effects of Training Set Size on Decision Tree Complexity

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Multivariate Versus Univariate Decision Trees

Multivariate Versus Univariate Decision Trees
Simplifying decision trees: A survey

The Knowledge Engineering Review
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Classification and regression: money *can* grow on trees

KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised and supervised learning in radial-basis-function networks

Self-Organizing neural networks
A framework for data mining and KDD

Proceedings of the 2002 ACM symposium on Applied computing
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Discovering Interesting Patterns for Investment Decision Making with GLOWER ◯-A Genetic Learner Overlaid with Entropy Reduction

Data Mining and Knowledge Discovery
Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks

Genetic Programming and Evolvable Machines
Non-parametric smoothing of the location model in mixed variable discrimination

Statistics and Computing
The Maximum Box Problem and its Application to Data Analysis

Computational Optimization and Applications
Efficient C4.5

IEEE Transactions on Knowledge and Data Engineering
Using k-nearest-neighbor classification in the leaves of a tree

Computational Statistics & Data Analysis
A scalable, incremental learning algorithm for classification problems

Computers and Industrial Engineering
Knowledge discovery techniques for predicting country investment risk

Computers and Industrial Engineering
Feature Selection for a Real-World Learning Task

MLDM '01 Proceedings of the Second International Workshop on Machine Learning and Data Mining in Pattern Recognition
Induction of Multivariate Decision Trees by Using Dipolar Criteria

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Modelling Classification Performance for Large Data Sets

WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
A Machine Learning Algorithm Based on Supervised Clustering and Classification

AMT '01 Proceedings of the 6th International Computer Science Conference on Active Media Technology
Model Complexity and Algorithm Selection in Classification

DS '02 Proceedings of the 5th International Conference on Discovery Science
Data mining tasks and methods: scalability

Handbook of data mining and knowledge discovery
Tree Induction for Probability-Based Ranking

Machine Learning
A Generic Hybrid Classifier Based on Hierarchical Fuzzy Modeling: Experiments on On-Line Handwritten Character Recognition

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
Statistical Relational Learning for Document Mining

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Benchmarking Least Squares Support Vector Machine Classifiers

Machine Learning
Efficient data mining for calling path patterns in GSM networks

Information Systems
Theoretical Comparison between the Gini Index and Information Gain Criteria

Annals of Mathematics and Artificial Intelligence
A framework for privacy preserving classification in data mining

ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
Benchmarking a Reduced Multivariate Polynomial Pattern Classifier

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation of decision trees: a multi-criteria approach

Computers and Operations Research
CLIP4: hybrid inductive machine learning algorithm that generates inequality rules

Information Sciences: an International Journal - Special issue: Soft computing data mining
C4.5 competence map: a phase transition-inspired approach

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Essential classification rule sets

ACM Transactions on Database Systems (TODS)
Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification

IEEE Transactions on Knowledge and Data Engineering
Logistic Model Trees

Machine Learning
Automatic identification of music performers with learning ensembles

Artificial Intelligence
A supervised clustering algorithm for computer intrusion detection

Knowledge and Information Systems
Behavior-based web page evaluation

Proceedings of the 15th international conference on World Wide Web
Spanned patterns for the logical analysis of data

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Accelerated algorithm for pattern detection in logical analysis of data

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Diversification for better classification trees

Computers and Operations Research
An empirical comparison of supervised learning algorithms

ICML '06 Proceedings of the 23rd international conference on Machine learning
Genetic parallel programming: design and implementation

Evolutionary Computation
A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

ACM SIGCOMM Computer Communication Review
Training a reciprocal-sigmoid classifier by feature scaling-space

Machine Learning
A hybrid learning-based model for on-line detection and analysis of control chart patterns

Computers and Industrial Engineering
Behavior-Based Web Page Evaluation

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Data mining for yield enhancement in semiconductor manufacturing and an empirical study

Expert Systems with Applications: An International Journal
Defect prevention in software processes: An action-based approach

Journal of Systems and Software
Combining schema and instance information for integrating heterogeneous data sources

Data & Knowledge Engineering
Post-pruning in decision tree induction using multiple performance measures

Computers and Operations Research
Isotonic Separation

INFORMS Journal on Computing
Data mining in conceptualising active ageing

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
The roles of diversity preservation and mutation in preventing population collapse in multiobjective genetic programming

Proceedings of the 9th annual conference on Genetic and evolutionary computation
An Extension of Iterative Scaling for Decision and Data Aggregation in Ensemble Classification

Journal of VLSI Signal Processing Systems
Logical analysis of data --- the vision of Peter L. Hammer

Annals of Mathematics and Artificial Intelligence
A lot of randomness is hiding in accuracy

Engineering Applications of Artificial Intelligence
Hybrid systems of local basis functions

Intelligent Data Analysis
GFAM: Evolving Fuzzy ARTMAP neural networks

Neural Networks
A review of associative classification mining

The Knowledge Engineering Review
Comparison of classification accuracy using Cohen's Weighted Kappa

Expert Systems with Applications: An International Journal
SMILE: Sound Multi-agent Incremental LEarning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Machine learning: a review of classification and combining techniques

Artificial Intelligence Review
Fusion of visual and infra-red face scores by weighted power series

Pattern Recognition Letters
Adaptive building of decision trees by reinforcement learning

AIC'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Informatics and Communications - Volume 7
An error-counting network for pattern classification

Neurocomputing
Data mining performance on perturbed databases: important influences on classification accuracy

International Journal of Information and Computer Security
Parallel learning using decision trees: a novel approach

AMCOS'05 Proceedings of the 4th WSEAS International Conference on Applied Mathematics and Computer Science
Deterministic neural classification

Neural Computation
RotBoost: A technique for combining Rotation Forest and AdaBoost

Pattern Recognition Letters
About the relationship between ROC curves and Cohen's kappa

Engineering Applications of Artificial Intelligence
Using a hierarchical multi-resolution mechanism for the classification and semantic extraction of landuse maps for Beer-Sheva, Israel

International Journal of Remote Sensing
An effective application of decision tree learning for on-line detection of mean shifts in multivariate control charts

Computers and Industrial Engineering
Neuro-IG: A Hybrid System for Selection and Elimination of Predictor Variables and non Relevant Individuals

Informatica
Towards Affective-Psychophysiological Foundations for Music Production

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
A Novel Algorithm for Associative Classification

Neural Information Processing
Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Cross-disciplinary perspectives on meta-learning for algorithm selection

ACM Computing Surveys (CSUR)
A Comparison of three evolutionary strategies for multiobjective genetic programming

Artificial Intelligence Review
Local reweight wrapper for the problem of imbalance

International Journal of Artificial Intelligence and Soft Computing
Handling imbalanced data sets with a modification of Decorate algorithm

International Journal of Computer Applications in Technology
Post-processing of associative classification rules using closed sets

Expert Systems with Applications: An International Journal
Accuracy of machine learning models versus "hand crafted" expert systems - A credit scoring case study

Expert Systems with Applications: An International Journal
A generic multi-dimensional feature extraction method using multiobjective genetic programming

Evolutionary Computation
Induction machine fault detection using clone selection programming

Expert Systems with Applications: An International Journal
Quadratic programming formulations for classificationand regression

Optimization Methods & Software - THE JOINT EUROPT-OMS CONFERENCE ON OPTIMIZATION, 4-7 JULY, 2007, PRAGUE, CZECH REPUBLIC, PART II
AG-ART: An adaptive approach to evolving ART architectures

Neurocomputing
Automatically evolving rule induction algorithms tailored to the prediction of postsynaptic activity in proteins

Intelligent Data Analysis
GA-based learning bias selection mechanism for real-time scheduling systems

Expert Systems with Applications: An International Journal
Supervised Machine Learning: A Review of Classification Techniques

Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies
Probabilistic scoring using decision trees for fast and scalable speaker recognition

Speech Communication
A Novel Bayesian Logistic Discriminant Model with Dirichlet Distributions: An Application to Face Recognition

ICIAR '09 Proceedings of the 6th International Conference on Image Analysis and Recognition
Concept Learning from (Very) Ambiguous Examples

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
On learning algorithm selection for classification

Applied Soft Computing
Automatic identification of music performers with learning ensembles

Artificial Intelligence
A hybrid learning-based model for on-line detection and analysis of control chart patterns

Computers and Industrial Engineering
Classification Based on Combination of Kernel Density Estimators

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Spanned patterns for the logical analysis of data

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Accelerated algorithm for pattern detection in logical analysis of data

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Transmembrane segments prediction and understanding using support vector machine and decision tree

Expert Systems with Applications: An International Journal
On preprocessing data for financial credit risk evaluation

Expert Systems with Applications: An International Journal
A novel Bayesian logistic discriminant model: An application to face recognition

Pattern Recognition
Evaluation of machine learning techniques for prostate cancer diagnosis and Gleason grading

International Journal of Computational Intelligence in Bioinformatics and Systems Biology
On the Importance of Comprehensible Classification Models for Protein Function Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Selective costing ensemble for handling imbalanced data sets

International Journal of Hybrid Intelligent Systems
IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule

Pattern Recognition
Data classification using genetic parallel programming

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Pruning neural networks with distribution estimation algorithms

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Rule induction for classification using multi-objective genetic programming

EMO'07 Proceedings of the 4th international conference on Evolutionary multi-criterion optimization
GP classifier problem decomposition using first-price and second-price auctions

EuroGP'07 Proceedings of the 10th European conference on Genetic programming
sIDMG: small-size intrusion detection model generation of complimenting decision tree classification algorithm

WISA'06 Proceedings of the 7th international conference on Information security applications: PartI
An empirical comparison of two methods for fuzzy density

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Are you becoming a diabetic? a data mining approach

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
When to choose an ensemble classifier model for data mining

International Journal of Business Intelligence and Data Mining
AIM-HI: a framework for request routing in large-scale IT global service delivery

IBM Journal of Research and Development
A generic optimising feature extraction method using multiobjective genetic programming

Applied Soft Computing
Soft Nearest Convex Hull Classifier

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A new sensitivity-preferred strategy to build prediction rules for therapy response of cancer patients using gene expression data

Computer Methods and Programs in Biomedicine
From linear to non-linear kernel based classifiers for bankruptcy prediction

Neurocomputing
Porting decision tree algorithms to multicore using fastflow

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Polynomial-based radial basis function neural networks (P-RBF NNs) realized with the aid of particle swarm optimization

Fuzzy Sets and Systems
Classification of software artifacts based on structural information

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Genetics-based machine learning for rule induction: state of the art, taxonomy, and comparative study

IEEE Transactions on Evolutionary Computation
Integrating genetic algorithm and decision tree learning for assistance in predicting in vitro fertilization outcomes

Expert Systems with Applications: An International Journal
Multiple regression, ANN (RBF, MLP) and ANFIS models for prediction of swell potential of clayey soils

Expert Systems with Applications: An International Journal
A condensed polynomial neural network for classification using swarm intelligence

Applied Soft Computing
Learning in the feed-forward random neural network: A critical review

Performance Evaluation
A retrieval strategy using the integrated knowledge of similarity and associations

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Behavior based web page evaluation

Journal of Web Engineering
A novel SVM+NDA model for classification with an application to face recognition

Pattern Recognition
Adjusting Fuzzy Similarity Functions for use with standard data mining tools

Journal of Systems and Software
Reputation inflation detection in a Chinese C2C market

Electronic Commerce Research and Applications
Robust classification ensemble method for microarray data

International Journal of Data Mining and Bioinformatics
Internet Auction Fraud Detection Using Social Network Analysis and Classification Tree Approaches

International Journal of Electronic Commerce
Wrapper- and ensemble-based feature subset selection methods for biomarker discovery in targeted metabolomics

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Automatically evolving rule induction algorithms

ECML'06 Proceedings of the 17th European conference on Machine Learning
Classification with support hyperplanes

ECML'06 Proceedings of the 17th European conference on Machine Learning
Outdoor image classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism

CAIP'05 Proceedings of the 11th international conference on Computer Analysis of Images and Patterns
Understanding protein structure prediction using SVM_DT

ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
ACME: an associative classifier based on maximum entropy principle

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Model selection in omnivariate decision trees

ECML'05 Proceedings of the 16th European conference on Machine Learning
Application and performance analysis of neural networks for decision support in conceptual design

Expert Systems with Applications: An International Journal
Evolutionary search of optimal features

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Methodological triangulation using neural networks for business research

Advances in Artificial Neural Systems
Inferring disease-related metabolite dependencies with a bayesian optimization algorithm

EvoBIO'12 Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Model selection in omnivariate decision trees using Structural Risk Minimization

Information Sciences: an International Journal
Intelligent Analysis of Acute Bed Overflow in a Tertiary Hospital in Singapore

Journal of Medical Systems
Induced states in a decision tree constructed by Q-learning

Information Sciences: an International Journal
Bandwidth selection in kernel density estimators for multiple-resolution classification

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Selection of evolutionary approach based hybrid data mining algorithms for decision support systems and business intelligence

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Improving classifier performance by knowledge-driven data preparation

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Algorithm learning based neural network integrating feature selection and classification

Expert Systems with Applications: An International Journal
Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

International Journal of Data Warehousing and Mining
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research
Particle swarm classification: A survey and positioning

Pattern Recognition
"Padding" bitmaps to support similarity and mining

Information Systems Frontiers
Decision trees: a recent overview

Artificial Intelligence Review
2013 Special Issue: Methods for pattern selection, class-specific feature selection and classification for automated learning

Neural Networks
A comparison of machine learning algorithms for proactive hard disk drive failure detection

Proceedings of the 4th international ACM Sigsoft symposium on Architecting critical systems
A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness

Information Technology and Management
Exploitation of pairwise class distances for ordinal classification

Neural Computation
Performance improvement of web caching in Web 2.0 via knowledge discovery

Journal of Systems and Software
Multinomial logit models with implicit variable selection

Advances in Data Analysis and Classification
(Psycho-)analysis of benchmark experiments: A formal framework for investigating the relationship between data sets and learning algorithms

Computational Statistics & Data Analysis
Towards UCI+: A mindful repository design

Information Sciences: an International Journal
Domains of competence of the semi-naive Bayesian network classifiers

Information Sciences: an International Journal
A novel method for combining Bayesian networks, theoretical analysis, and its applications

Pattern Recognition

Quantified Score

Hi-index	0.02

Visualization

Abstract

Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classification accuracy, training time, and (in the case of trees) number of leaves. Classification accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called POLYCLSSS at the top, although it is not statistically significantly different from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is QUEST with linear splits, which ranks fourth and fifth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. POLYCLASS, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The QUEST and logistic regression algorithms are substantially faster. Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed. But C4.5 tends to produce trees with twice as many leaves as those from IND-CART and QUEST.