A Super-Programming Approach for Mining Association Rules in Parallel on PC Clusters

Authors:
Dejiang Jin;Sotirios G. Ziavras
Affiliations:
-;IEEE
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2004

Citing 18
Cited 7

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Parallel mining algorithms for generalized association rules with classification hierarchy

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel data mining for association rules on shared-memory multi-processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Hash based parallel algorithms for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
JavaSymphony: new directives to control and synchronize locality, parallelism, and load balancing for cluster and GRID-computing

JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Effect of Data Skewness and Workload Balance in Parallel Data Mining

IEEE Transactions on Knowledge and Data Engineering
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
High Performance Computing at Intel: The OSCAR Software Solution Stack for Cluster Computing

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
The Differences of Parallel Efficiency between the Two Models of Parallel Genetic Algorithms on PC Cluster Systems

HPC '00 Proceedings of the The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 2 - Volume 2
Optimizing Protocol Parameters to Large Scale PC Cluster and Evaluation of its Effectiveness with Parallel Data Mining

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing

H-SIMD Machine: Configurable Parallel Computing for Matrix Multiplication

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Research note: Modeling distributed data representation and its effect on parallel data accesses

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Domain and data partitioning for parallel mining of frequent closed itemsets

Proceedings of the 43rd annual Southeast regional conference - Volume 1
Robust scalability analysis and SPM case studies

The Journal of Supercomputing
An approach to mining bundled commodities

Knowledge-Based Systems
An FPGA-Based parallel accelerator for matrix multiplications in the newton-raphson method

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
A data mining approach for branch and ATM site evaluation

Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

PC clusters have become popular in parallel processing. They do not involve specialized interprocessor networks, so the latency of data communications is rather long. The programming models for PC clusters are often different than those for parallel machines or supercomputers containing sophisticated interprocessor communication networks. For PC clusters, load balancing among the nodes becomes a more critical issue in attempts to yield high performance. We introduce a new model for program development on PC clusters, namely, the Super-Programming Model (SPM). The workload is modeled as a collection of Super-Instructions (SIs). We propose that a set of SIs be designed for each application domain. They should constitute an orthogonal set of frequently used high-level operations in the corresponding application domain. Each SI should normally be implemented as a high-level language routine that can execute on any PC. Application programs are modeled as Super-Programs (SPs), which are coded using SIs. SIs are dynamically assigned to available PCs at runtime. Because of the known granularity of SIs, an upper bound on their execution time can be estimated at static time. Therefore, dynamic load balancing becomes an easier task. Our motivation is to support dynamic load balancing and code porting, especially for applications with diverse sets of inputs such as data mining. We apply here SPM to the implementation of an Apriori-like algorithm for mining association rules. Our experiments show that the average idle time per node is kept very low.