Pfp: parallel fp-growth for query recommendation

Authors:
Haoyuan Li;Yi Wang;Dong Zhang;Ming Zhang;Edward Y. Chang
Affiliations:
Google Beijing Research, Beijing, China;Google Beijing Research, Beijing, China;Google Beijing Research, Beijing, China;Peking University, Beijing, China;Google Research, Mountain View, CA, USA
Venue:
Proceedings of the 2008 ACM conference on Recommender systems
Year:
2008

Citing 7
Cited 32

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Parallel Leap: Large-Scale Maximal Pattern Mining in a Distributed Environment

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Toward terabyte pattern mining: an architecture-conscious solution

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Optimization of frequent itemset mining on multiple-core processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Parallel FP-growth on PC cluster

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Learning to tag

Proceedings of the 18th international conference on World wide web
Collaborative filtering for orkut communities: discovery of user latent behavior

Proceedings of the 18th international conference on World wide web
Frequent itemset mining on graphics processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Parallel algorithms for mining large-scale rich-media data

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Confucius and "its" intelligent disciples

Proceedings of the 18th ACM conference on Information and knowledge management
Processing web-scale multimedia data

Proceedings of the international conference on Multimedia
Mining significant least association rules using fast SLP-growth algorithm

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Memory-efficient frequent-itemset mining

Proceedings of the 14th International Conference on Extending Database Technology
Towards improved load balancing for data intensive distributed computing

Proceedings of the 2011 ACM Symposium on Applied Computing
NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic tagging by exploring tag information capability and correlation

World Wide Web
A study on workload imbalance issues in data intensive distributed computing

DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems
Apriori-based frequent itemset mining algorithms on MapReduce

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Mining for insights in the search engine query stream

Proceedings of the 21st international conference companion on World Wide Web
Distributed methodology of cantree construction

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
A conversation with Dr. Edward Y. Chang

ACM SIGKDD Explorations Newsletter
A distributed recommender system architecture

International Journal of Web Engineering and Technology
Message-driven FP-growth

Proceedings of the WICSA/ECSA 2012 Companion Volume
Spotting trends: the wisdom of the few

Proceedings of the sixth ACM conference on Recommender systems
MapReduce algorithms for big data analysis

Proceedings of the VLDB Endowment
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce

Proceedings of the 21st ACM international conference on Information and knowledge management
GPU acceleration of probabilistic frequent itemset mining from uncertain databases

Proceedings of the 21st ACM international conference on Information and knowledge management
Parallel approaches to machine learning-A comprehensive survey

Journal of Parallel and Distributed Computing
A decentralized approach for mining event correlations in distributed system monitoring

Journal of Parallel and Distributed Computing
Computing n-gram statistics in MapReduce

Proceedings of the 16th International Conference on Extending Database Technology
Mind the gap: large-scale frequent sequence mining

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Modeling I/O interference for data intensive distributed applications

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Toward self-correcting search engines: using underperforming queries to improve search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A framework for detecting public health trends with Twitter

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Visualizing the impact of time series data for predicting user interactions

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Direct out-of-memory distributed parallel frequent pattern mining

Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Novel parallel method for mining frequent patterns on multi-core shared memory systems

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent itemset mining (FIM) is a useful tool for discovering frequently co-occurrent items. Since its inception, a number of significant FIM algorithms have been developed to speed up mining performance. Unfortunately, when the dataset size is huge, both the memory use and computational cost can still be prohibitively expensive. In this work, we propose to parallelize the FP-Growth algorithm (we call our parallel algorithm PFP) on distributed machines. PFP partitions computation in such a way that each machine executes an independent group of mining tasks. Such partitioning eliminates computational dependencies between machines, and thereby communication between them. Through empirical study on a large dataset of 802,939 Web pages and 1,021,107 tags, we demonstrate that PFP can achieve virtually linear speedup. Besides scalability, the empirical study demonstrates that PFP to be promising for supporting query recommendation for search engines.