pcApriori: scalable apriori for multiprocessor systems

  • Authors:
  • Benjamin Schlegel;Tim Kiefer;Thomas Kissinger;Wolfgang Lehner

  • Affiliations:
  • Database Technology Group, TU Dresden, Dresden, Germany;Database Technology Group, TU Dresden, Dresden, Germany;Database Technology Group, TU Dresden, Dresden, Germany;Database Technology Group, TU Dresden, Dresden, Germany

  • Venue:
  • Proceedings of the 25th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several hundred gigabytes of data. Hence, efficient algorithms are required to process such amounts of data. In the recent years, there have been proposed many efficient sequential mining algorithms, which however cannot exploit current and future systems providing large degrees of parallelism. Contrary, the number of parallel frequent-itemset mining algorithms is rather small and most of them do not scale well as the number of threads is largely increased. In this paper, we present a highly-scalable mining algorithm that is based on the well-known Apriori algorithm; it is optimized for processing very large datasets on multiprocessor systems. The key idea of pcApriori is to employ a modified producer--consumer processing scheme, which partitions the data during processing and distributes it to the available threads. We conduct many experiments on large datasets. pcApriori scales almost linear on our test system comprising 32 cores.