An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Fast discovery of association rules
Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison
ACM SIGKDD Explorations Newsletter
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Computer
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Adaptive and Resource-Aware Mining of Frequent Sets
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Journal of Parallel and Distributed Computing
Distributed approximate mining of frequent patterns
Proceedings of the 2005 ACM symposium on Applied computing
LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Distributed smart disks for I/O-intensive workloads on switched interconnects
Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Anteater: A Service-Oriented Architecture for High-Performance Data Mining
IEEE Internet Computing
Optimization of frequent itemset mining on multiple-core processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Distributed smart disks for I/O-intensive workloads on switched interconnects
Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Design and evaluation of distributed smart disk architecture for I/O-intensive workloads
ICCS'03 Proceedings of the 2003 international conference on Computational science
Tree partition based parallel frequent pattern mining on shared memory systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets are nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, which in many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm for counting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count & Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computing platform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets). ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and by relying on a multi-level parallelization approach which explicitly targets clusters of SMPs, an emerging computing platform. We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploits the memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducing communication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances under a variety of conditions.