C4.5: programs for machine learning
C4.5: programs for machine learning
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Machine learning, neural and statistical classification
Machine learning, neural and statistical classification
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
High performance data mining (tutorial PM-3)
Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Systems support for scalable data mining
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Parallel Formulations of Decision-Tree Classification Algorithms
Data Mining and Knowledge Discovery
Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90
The Journal of Supercomputing
Parallel data intensive computing in scientific and commercial applications
Parallel Computing - Parallel data-intensive algorithms and applications
High-performance data mining with skeleton-based structured parallel programming
Parallel Computing - Parallel data-intensive algorithms and applications
IEEE Transactions on Knowledge and Data Engineering
A Requirements Analysis for Parallel KDD Systems
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Shared Memory Parallelization of Decision Tree Construction Using a General Data Mining Middleware
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Parallel and Distributed Data Mining: An Introduction
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Efficient Parallel Algorithms for Mining Associations
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Efficient Parallel Classification Using Dimensional Aggregates
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Distributed dynamic hash tables using IBM LAPI
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A parallel learning algorithm for text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Handbook of data mining and knowledge discovery
A Parallel Scalable Infrastructure for OLAP and Data Mining
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Sourcebook of parallel computing
Prototype-based mining of numeric data streams
Proceedings of the 2003 ACM symposium on Applied computing
IEEE Transactions on Knowledge and Data Engineering
Multi-Constraint Mesh Partitioning for Contact/Impact Computations
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hierarchical Decision Tree Induction in Distributed Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
Interactive presentation: An FPGA implementation of decision tree classification
Proceedings of the conference on Design, automation and test in Europe
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Middleware for data mining applications on clusters and grids
Journal of Parallel and Distributed Computing
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Huge Data Mining Based on Rough Set Theory and Granular Computing
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
PLANET: massively parallel learning of tree ensembles with MapReduce
Proceedings of the VLDB Endowment
Performance characterization of data mining benchmarks
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
A Streaming Parallel Decision Tree Algorithm
The Journal of Machine Learning Research
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Quick knowledge reduction based on divide and conquer method in huge data sets
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Porting decision tree algorithms to multicore using fastflow
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
RMS-TM: a comprehensive benchmark suite for transactional memory systems
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
A parallel random forest classifier for R
Proceedings of the second international workshop on Emerging computational methods for the life sciences
Interactive data mining on a CBEA cluster
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
HyParSVM: a new hybrid parallel software for support vector machine learning on SMP clusters
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A resistive TCAM accelerator for data-intensive computing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Scalable random forests for massive data
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Parallel approaches to machine learning-A comprehensive survey
Journal of Parallel and Distributed Computing
AC-DIMM: associative computing with STT-MRAM
Proceedings of the 40th Annual International Symposium on Computer Architecture
Performance of an intuitive hash table in shared-memory parallel programs
Proceedings of the High Performance Computing Symposium
The Journal of Supercomputing
Hi-index | 0.00 |
In this paper, we present ScalParC (Scalable Parallel Classifier), a new parallel formulation of a decision tree based classification process. Like other state-of-the-art decision tree classifiers such as SPRINT, ScalParC is suited for handling large datasets. We show that existing parallel formulation of SPRINT is unscalable, whereas ScalParC is shown to be scalable in both runtime and memory requirements. We present the experimental results of classifying up to 6.4 million records on up to 128 processors of Cray T3D, in order to demonstrate the scalable behavior of Scal-ParC. A key component of ScalParC is the parallel hash table. The proposed parallel hashing paradigm can be used to parallelize other algorithms that require many concurrent updates to a large hash table.