Parallel Formulations of Decision-Tree Classification Algorithms

Authors:
Anurag Srivastava;Eui-Hong Han;Vipin Kumar;Vineet Singh
Affiliations:
Digital Impact. anurag@digital-impact.com;Department of Computer Science & Engineering, Army HPC Research Center, University of Minnesota. han@cs.umn.edu;Department of Computer Science & Engineering, Army HPC Research Center, University of Minnesota. kumar@cs.umn.edu;Information Technology Lab, Hitachi America, Ltd. vsingh@hitachi.com
Venue:
Data Mining and Knowledge Discovery
Year:
1999

Citing 13
Cited 25

C4.5: programs for machine learning

C4.5: programs for machine learning
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Experiments on multistrategy learning by meta-learning

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Use of Contextual Information for Feature Ranking and Discretization

IEEE Transactions on Knowledge and Data Engineering
Unstructured Tree Search on SIMD Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Concatenated Parallelism: A Technique for Efficient Parallel Divide and Conquer

SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

High performance data mining (tutorial PM-3)

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Iceberg-cube computation with PC clusters

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Boosting Algorithms for Parallel and Distributed Learning

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
High-performance data mining with skeleton-based structured parallel programming

Parallel Computing - Parallel data-intensive algorithms and applications
Efficient C4.5

IEEE Transactions on Knowledge and Data Engineering
Analysis and synthesis of agents that learn from distributed dynamic data sources

Emergent neural computational architectures based on neuroscience
Parallel Implementation of Decision Tree Learning Algorithms

EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Parallelisation of C4.5 as a Particular Divide and Conquer Computation

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Analysis and Synthesis of Agents That Learn from Distributed Dynamic Data Sources

Emergent Neural Computational Architectures Based on Neuroscience - Towards Neuroscience-Inspired Computing
Parallel and Distributed Data Mining: An Introduction

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Implementation and performance evaluation of dynamic scheduling for parallel decision tree generation

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Data mining tasks and methods: parallel methods for scaling data mining algorithms to large data sets

Handbook of data mining and knowledge discovery
Parallel and distributed data mining through parallel skeletons and distributed objects

Data mining
References

Sourcebook of parallel computing
Hierarchical Decision Tree Induction in Distributed Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
Parallel univariate decision trees

Pattern Recognition Letters
A Streaming Parallel Decision Tree Algorithm

The Journal of Machine Learning Research
High performance data mining

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Modeling of network computing systems for decision tree induction tasks

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Parallel boosted regression trees for web search ranking

Proceedings of the 20th international conference on World wide web
Algorithms and software for collaborative discovery from autonomous, semantically heterogeneous, distributed information sources

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Parallel decision tree with application to water quality data analysis

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Decision trees: a recent overview

Artificial Intelligence Review
Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

The Journal of Supercomputing
A hybrid decision tree classifier

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification decision tree algorithms are usedextensively for data mining in many domains such as retail targetmarketing, fraud detection, etc. Highly parallel algorithms forconstructing classification decision trees are desirable for dealingwith large data sets in reasonable amount of time. Algorithms forbuilding classification decision trees have a natural concurrency,but are difficult to parallelize due to the inherent dynamic natureof the computation. In this paper, we present parallel formulationsof classification decision tree learning algorithm based oninduction. We describe two basic parallel formulations. One isbased on Synchronous Tree Construction Approach and the otheris based on Partitioned Tree Construction Approach. We discussthe advantages and disadvantages of using these methods and propose ahybrid method that employs the good features of these methods. Wealso provide the analysis of the cost of computation andcommunication of the proposed hybrid method. Moreover, experimentalresults on an IBM SP-2 demonstrate excellent speedups andscalability.