A high-performance distributed algorithm for mining association rules

  • Authors:
  • Assaf Schuster;Ran Wolff;Dan Trock

  • Affiliations:
  • Technion—Israel Institute of Technology, Department of Computer Science, 32000, Haifa, Israel;Technion—Israel Institute of Technology, Department of Computer Science, 32000, Haifa, Israel;Technion—Israel Institute of Technology, Department of Electrical Engineering, 32000, Haifa, Israel

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum.