pcApriori: scalable apriori for multiprocessor systems

Authors:
Benjamin Schlegel;Tim Kiefer;Thomas Kissinger;Wolfgang Lehner
Affiliations:
Database Technology Group, TU Dresden, Dresden, Germany;Database Technology Group, TU Dresden, Dresden, Germany;Database Technology Group, TU Dresden, Dresden, Germany;Database Technology Group, TU Dresden, Dresden, Germany
Venue:
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Year:
2013

Citing 19
Cited 0

Fast sequential and parallel algorithms for association rule mining: a comparison

Fast sequential and parallel algorithms for association rule mining: a comparison
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining association rules with multiple minimum supports

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
File structures using hashing functions

Communications of the ACM
Parallel data mining for association rules on shared-memory multi-processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Enhancing the Apriori Algorithm for Frequent Set Counting

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Real World Association Rule Mining

BNCOD 19 Proceedings of the 19th British National Conference on Databases: Advances in Databases
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
Advances in frequent itemset mining implementations: report on FIMI'03

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Fastest association rule mining algorithm predictor (FARM-AP)

Proceedings of The Fourth International C* Conference on Computer Science and Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several hundred gigabytes of data. Hence, efficient algorithms are required to process such amounts of data. In the recent years, there have been proposed many efficient sequential mining algorithms, which however cannot exploit current and future systems providing large degrees of parallelism. Contrary, the number of parallel frequent-itemset mining algorithms is rather small and most of them do not scale well as the number of threads is largely increased. In this paper, we present a highly-scalable mining algorithm that is based on the well-known Apriori algorithm; it is optimized for processing very large datasets on multiprocessor systems. The key idea of pcApriori is to employ a modified producer--consumer processing scheme, which partitions the data during processing and distributes it to the available threads. We conduct many experiments on large datasets. pcApriori scales almost linear on our test system comprising 32 cores.