Parallel Bifold: Large-scale parallel pattern mining with constraints

  • Authors:
  • Mohammad El-Hajj;Osmar R. Zaïane

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, Canada;Department of Computing Science, University of Alberta, Edmonton, Canada

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

When computationally feasible, mining huge databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have suggested the use of constraints to apply to the patterns or searching for frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets.We propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of patterns while pushing constraints. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than an hour and a half.