Parallel frequent set counting

  • Authors:
  • D. B. Skillicorn

  • Affiliations:
  • Department of Computing and Information Science, Queen's University, Kingston, Canada

  • Venue:
  • Parallel Computing - Parallel data-intensive algorithms and applications
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computing the frequent subsets of large multi-attribute data is a key component of local pattern detection data mining algorithms. It is both computation- and data-intensive. The standard parallel algorithms require multiple passes through the data. The cost of data access may easily outweigh any performance gained by parallelizing the computational part. We address two opportunities for performance improvement: using a parallel approximate algorithm that requires only a single pass over the data; and using a probabilistic technique to avoid generating most of the lattice of subsets implied by each object's data. The computation required is only slightly greater than levelwise algorithms, but the amount of data access is much smaller.