Partitioning strategies for distributed association rule mining

  • Authors:
  • Frans Coenen;Paul Leng

  • Affiliations:
  • Department of Computer Science, University of Liverpool, Liverpool L69 3BX, UK. E-mail: frans@csc.liv.ac.uk, phl@csc.liv.ac.uk;Department of Computer Science, University of Liverpool, Liverpool L69 3BX, UK. E-mail: frans@csc.liv.ac.uk, phl@csc.liv.ac.uk

  • Venue:
  • The Knowledge Engineering Review
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper a number of alternative strategies for distributed/parallel association rule mining are investigated. The methods examined make use of a data structure, the T-tree, introduced previously by the authors as a structure for organizing sets of attributes for which support is being counted. We consider six different approaches, representing different ways of parallelizing the basic Apriori-T algorithm that we use. The methods focus on different mechanisms for partitioning the data between processes, and for reducing the message-passing overhead. Both ‘horizontal’ (data distribution) and ‘vertical’ (candidate distribution) partitioning strategies are considered, including a vertical partitioning algorithm (DATA-VP) which we have developed to exploit the structure of the T-tree. We present experimental results examining the performance of the methods in implementations using JavaSpaces. We conclude that in a JavaSpaces environment, candidate distribution strategies offer better performance than those that distribute the original dataset, because of the lower messaging overhead, and the DATA-VP algorithm produced results that are especially encouraging.