Parallel Implementation of Decision Tree Learning Algorithms

Authors:
Nuno Amado;Joao Gama;Fernando M. A. Silva
Affiliations:
-;-;-
Venue:
EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Year:
2001

Citing 10
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Parallel programming with MPI

Parallel programming with MPI
Parallel sorting on a shared-nothing architecture using probabilistic splitting

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Mining Very Large Databases with Parallel Processing

Mining Very Large Databases with Parallel Processing
Parallel Formulations of Decision-Tree Classification Algorithms

Data Mining and Knowledge Discovery
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Parallel Induction Algorithms for Data Mining

IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
Efficient C4.5

Efficient C4.5

A distributed hebb neural network for network anomaly detection

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the fields of data mining and machine learning the amount of data available for building classifiers is growing very fast. Therefore, there is a great need for algorithms that are capable of building classifiers from very-large datasets and, simultaneously, being computationally efficient and scalable. One possible solution is to employ parallelism to reduce the amount of time spent in building classifiers from very-large datasets and keeping the classification accuracy. This work first overviews some strategies for implementing decision tree construction algorithms in parallel based on techniques such as task parallelism, data parallelism and hybrid parallelism. We then describe a new parallel implementation of the C4.5 decision tree construction algorithm. Even though the implementation of the algorithm is still in final development phase, we present some experimental results that can be used to predict the expected behavior of the algorithm.