Pfp: parallel fp-growth for query recommendation

  • Authors:
  • Haoyuan Li;Yi Wang;Dong Zhang;Ming Zhang;Edward Y. Chang

  • Affiliations:
  • Google Beijing Research, Beijing, China;Google Beijing Research, Beijing, China;Google Beijing Research, Beijing, China;Peking University, Beijing, China;Google Research, Mountain View, CA, USA

  • Venue:
  • Proceedings of the 2008 ACM conference on Recommender systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent itemset mining (FIM) is a useful tool for discovering frequently co-occurrent items. Since its inception, a number of significant FIM algorithms have been developed to speed up mining performance. Unfortunately, when the dataset size is huge, both the memory use and computational cost can still be prohibitively expensive. In this work, we propose to parallelize the FP-Growth algorithm (we call our parallel algorithm PFP) on distributed machines. PFP partitions computation in such a way that each machine executes an independent group of mining tasks. Such partitioning eliminates computational dependencies between machines, and thereby communication between them. Through empirical study on a large dataset of 802,939 Web pages and 1,021,107 tags, we demonstrate that PFP can achieve virtually linear speedup. Besides scalability, the empirical study demonstrates that PFP to be promising for supporting query recommendation for search engines.