A fast decision tree learning algorithm

Authors:
Jiang Su;Harry Zhang
Affiliations:
Faculty of Computer Science, University of New Brunswick, NB, Canada;Faculty of Computer Science, University of New Brunswick, NB, Canada
Venue:
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Year:
2006

Citing 13
Cited 7

Symbolic and Neural Learning Algorithms: An Experimental Comparison

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets

Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
Learning from little: comparison of classifiers given little training

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Learning Bayesian networks with local structure

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence

Stochastic gradient boosted distributed decision trees

Proceedings of the 18th ACM conference on Information and knowledge management
A Modified Short and Fukunaga Metric based on the attribute independence assumption

Pattern Recognition Letters
Entropy-Guided feature generation for structured learning of portuguese dependency parsing

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
An Augmented Value Difference Measure

Pattern Recognition Letters
Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques

Knowledge-Based Systems
A self-learning nurse call system

Computers in Biology and Medicine
Hybrid random subsample classifier ensemble for high dimensional data sets

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is growing interest in scaling up the widely-used decision-tree learning algorithms to very large data sets. Although numerous diverse techniques have been proposed, a fast tree-growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential. In this paper, we present a novel, fast decision-tree learning algorithm that is based on a conditional independence assumption. The new algorithm has a time complexity of O(m ċ n), where m is the size of the training data and n is the number of attributes. This is a significant asymptotic improvement over the time complexity O(m ċ n2) of the standard decision-tree learning algorithm C4.5, with an additional space increase of only O(n). Experiments show that our algorithm performs competitively with C4.5 in accuracy on a large number of UCI benchmark data sets, and performs even better and significantly faster than C4.5 on a large number of text classification data sets. The time complexity of our algorithm is as low as naive Bayes'. Indeed, it is as fast as naive Bayes but outperforms naive Bayes in accuracy according to our experiments. Our algorithm is a core tree-growing algorithm that can be combined with other scaling-up techniques to achieve further speedup.