An efficient parallel and distributed algorithm for counting frequent sets

Authors:
Salvatore Orlando;Paolo Palmerini;Raffaele Perego;Fabrizio Silvestri
Affiliations:
Dipartimento di Informatica, Università Ca' Foscari, Venezia, Italy;Dipartimento di Informatica, Università Ca' Foscari, Venezia, Italy and Istituto CNUCE, Consiglio Nazionale delle Ricerche, Pisa, Italy;Istituto CNUCE, Consiglio Nazionale delle Ricerche, Pisa, Italy;Istituto CNUCE, Consiglio Nazionale delle Ricerche, Pisa, Italy and Dipartimento di Informatica, Università di Pisa, Italy
Venue:
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Year:
2002

Citing 15
Cited 9

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Mining Very Large Databases

Computer
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Adaptive and Resource-Aware Mining of Frequent Sets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation

Journal of Parallel and Distributed Computing
Distributed approximate mining of frequent patterns

Proceedings of the 2005 ACM symposium on Applied computing
LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Distributed smart disks for I/O-intensive workloads on switched interconnects

Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Anteater: A Service-Oriented Architecture for High-Performance Data Mining

IEEE Internet Computing
Optimization of frequent itemset mining on multiple-core processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Distributed smart disks for I/O-intensive workloads on switched interconnects

Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Design and evaluation of distributed smart disk architecture for I/O-intensive workloads

ICCS'03 Proceedings of the 2003 international conference on Computational science
Tree partition based parallel frequent pattern mining on shared memory systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets are nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, which in many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm for counting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count & Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computing platform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets). ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and by relying on a multi-level parallelization approach which explicitly targets clusters of SMPs, an emerging computing platform. We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploits the memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducing communication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances under a variety of conditions.