ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets

Authors:
Affiliations:
Venue:
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Year:
1998

Citing 5
Cited 46

C4.5: programs for machine learning

C4.5: programs for machine learning
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

High performance data mining (tutorial PM-3)

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Systems support for scalable data mining

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Parallel Formulations of Decision-Tree Classification Algorithms

Data Mining and Knowledge Discovery
Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90

The Journal of Supercomputing
Parallel data intensive computing in scientific and commercial applications

Parallel Computing - Parallel data-intensive algorithms and applications
High-performance data mining with skeleton-based structured parallel programming

Parallel Computing - Parallel data-intensive algorithms and applications
Efficient C4.5

IEEE Transactions on Knowledge and Data Engineering
A Requirements Analysis for Parallel KDD Systems

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Shared Memory Parallelization of Decision Tree Construction Using a General Data Mining Middleware

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Parallel and Distributed Data Mining: An Introduction

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Efficient Parallel Algorithms for Mining Associations

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Efficient Parallel Classification Using Dimensional Aggregates

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Distributed dynamic hash tables using IBM LAPI

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A parallel learning algorithm for text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining tasks and methods: parallel methods for scaling data mining algorithms to large data sets

Handbook of data mining and knowledge discovery
A Parallel Scalable Infrastructure for OLAP and Data Mining

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Parallel and distributed data mining through parallel skeletons and distributed objects

Data mining
References

Sourcebook of parallel computing
Prototype-based mining of numeric data streams

Proceedings of the 2003 ACM symposium on Applied computing
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
Multi-Constraint Mesh Partitioning for Contact/Impact Computations

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hierarchical Decision Tree Induction in Distributed Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
Interactive presentation: An FPGA implementation of decision tree classification

Proceedings of the conference on Design, automation and test in Europe
On the optimal working set size in serial and parallel support vector machine learning with the decomposition algorithm

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Middleware for data mining applications on clusters and grids

Journal of Parallel and Distributed Computing
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Huge Data Mining Based on Rough Set Theory and Granular Computing

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment
Performance characterization of data mining benchmarks

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
A Streaming Parallel Decision Tree Algorithm

The Journal of Machine Learning Research
High performance data mining

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Quick knowledge reduction based on divide and conquer method in huge data sets

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Porting decision tree algorithms to multicore using fastflow

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
RMS-TM: a comprehensive benchmark suite for transactional memory systems

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
A parallel random forest classifier for R

Proceedings of the second international workshop on Emerging computational methods for the life sciences
Interactive data mining on a CBEA cluster

HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
HyParSVM: a new hybrid parallel software for support vector machine learning on SMP clusters

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A resistive TCAM accelerator for data-intensive computing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Scalable random forests for massive data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks

Knowledge-Based Systems
HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing
AC-DIMM: associative computing with STT-MRAM

Proceedings of the 40th Annual International Symposium on Computer Architecture
Performance of an intuitive hash table in shared-memory parallel programs

Proceedings of the High Performance Computing Symposium
Parallel data mining techniques on Graphics Processing Unit with Compute Unified Device Architecture (CUDA)

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present ScalParC (Scalable Parallel Classifier), a new parallel formulation of a decision tree based classification process. Like other state-of-the-art decision tree classifiers such as SPRINT, ScalParC is suited for handling large datasets. We show that existing parallel formulation of SPRINT is unscalable, whereas ScalParC is shown to be scalable in both runtime and memory requirements. We present the experimental results of classifying up to 6.4 million records on up to 128 processors of Cray T3D, in order to demonstrate the scalable behavior of Scal-ParC. A key component of ScalParC is the parallel hash table. The proposed parallel hashing paradigm can be used to parallelize other algorithms that require many concurrent updates to a large hash table.