Parallel univariate decision trees

Authors:
Olcay Taner Yıldız;Onur Dikmen
Affiliations:
Department of Computer Engineering, Işık University, Şile, İstanbul, Turkey;Department of Computer Engineering, Boğaziçi University, Bebek, İstanbul, Turkey
Venue:
Pattern Recognition Letters
Year:
2007

Citing 4
Cited 4

Parallel Formulations of Decision-Tree Classification Algorithms

Data Mining and Knowledge Discovery
Induction of Decision Trees

Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Some Progress of Supervised Learning

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Modeling of network computing systems for decision tree induction tasks

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Decision trees: a recent overview

Artificial Intelligence Review
A hybrid decision tree classifier

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.10

Visualization

Abstract

Univariate decision tree algorithms are widely used in data mining because (i) they are easy to learn (ii) when trained they can be expressed in rule based manner. In several applications mainly including data mining, the dataset to be learned is very large. In those cases it is highly desirable to construct univariate decision trees in reasonable time. This may be accomplished by parallelizing univariate decision tree algorithms. In this paper, we first present two different univariate decision tree algorithms C4.5 and univariate linear discriminant tree. We show how to parallelize these algorithms in three ways: (i) feature based; (ii) node based; (iii) data based manners. Experimental results show that performance of the parallelizations highly depend on the dataset and the node based parallelization demonstrate good speedups.