Algorithms for clustering data
Algorithms for clustering data
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
C4.5: programs for machine learning
C4.5: programs for machine learning
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast sequential and parallel algorithms for association rule mining: a comparison
Fast sequential and parallel algorithms for association rule mining: a comparison
An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Scalable parallel data mining for association rules
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Compiler and software distributed shared memory support for irregular applications
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
BOAT—optimistic decision tree construction
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compiling object-oriented data intensive applications
Proceedings of the 14th international conference on Supercomputing
Adaptive reduction parallelization techniques
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Oracle parallel processing
Data mining: concepts and techniques
Data mining: concepts and techniques
Parallel data mining for association rules on shared-memory multi-processors
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Distributed data clustering can be efficient and exact
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Performance prediction for random write reductions: a case study in modeling shared memory programs
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Parallel data mining for association rules on shared memory systems
Knowledge and Information Systems
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Strategies for Parallel Data Mining
IEEE Concurrency
Parallel Programming with Polaris
Computer
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules
IEEE Transactions on Knowledge and Data Engineering
Database Mining: A Performance Perspective
IEEE Transactions on Knowledge and Data Engineering
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Parallel Formulations of Decision-Tree Classification Algorithms
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
An efficient association mining implementation on clusters of SMP
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mining of Association Rules in Very Large Databases: A Structured Parallel Approach
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Parallel Classification for Data Mining on Shared-Memory Multiprocessors
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient C4.5
Mechanisms for efficient shared-memory, lock-based synchronization
Mechanisms for efficient shared-memory, lock-based synchronization
Compiler and runtime support for shared memory parallelization of data mining algorithms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Parallelizing a Defect Detection and Categorization Application
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Distributed computation of the knn graph for large high-dimensional point sets
Journal of Parallel and Distributed Computing
Fast split selection method and its application in decision tree construction from large databases
International Journal of Hybrid Intelligent Systems - Hybrid Intelligence using rough sets
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Middleware for data mining applications on clusters and grids
Journal of Parallel and Distributed Computing
Optimization of frequent itemset mining on multiple-core processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An analytical model of locality-based parallel irregular reductions
Parallel Computing
A Vision for Cyberinfrastructure for Coastal Forecasting and Change Analysis
GeoSensor Networks
Distributed Management of Massive Data: An Efficient Fine-Grain Data Access Scheme
High Performance Computing for Computational Science - VECPAR 2008
Performance Issues in Parallelizing Data-Intensive Applications on a Multi-core Cluster
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Parallel fuzzy c-means cluster analysis
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Porting decision tree algorithms to multicore using fastflow
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Tree partition based parallel frequent pattern mining on shared memory systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning-based entity resolution with MapReduce
Proceedings of the third international workshop on Cloud data management
Interactive data mining on a CBEA cluster
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
HyParSVM: a new hybrid parallel software for support vector machine learning on SMP clusters
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Data mining with parallel support vector machines for classification
ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Parallel nearest neighbour algorithms for text categorization
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce
Proceedings of the 21st ACM international conference on Information and knowledge management
Accelerating Biomedical Data-Intensive Applications Using MapReduce
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
pcApriori: scalable apriori for multiprocessor systems
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Efficient mining of frequent itemsets in social network data based on MapReduce framework
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of popular data mining algorithms. In addition, we propose a reduction-object-based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the techniques we have developed starting from a common specification of the algorithm. We have carried out a detailed evaluation of the parallelization techniques and the programming interface. We have experimented with apriori and fp-tree-based association mining, k-means clustering, k-nearest neighbor classifier, and decision tree construction. The main results from our experiments are as follows: 1) Among full replication, optimized full locking, and cache-sensitive locking, there is no clear winner. Each of these three techniques can outperform others depending upon machine and dataset parameters. These three techniques perform significantly better than the other two techniques. 2) Good parallel efficiency is achieved for each of the four algorithms we experimented with, using our techniques and runtime system. 3) The overhead of the interface is within 10 percent in almost all cases. 4) In the case of decision tree construction, combining different techniques turned out to be crucial for achieving high performance.