Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Parallel Leap: Large-Scale Maximal Pattern Mining in a Distributed Environment
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Toward terabyte pattern mining: an architecture-conscious solution
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Optimization of frequent itemset mining on multiple-core processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Parallel FP-growth on PC cluster
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Proceedings of the 18th international conference on World wide web
Collaborative filtering for orkut communities: discovery of user latent behavior
Proceedings of the 18th international conference on World wide web
Frequent itemset mining on graphics processors
Proceedings of the Fifth International Workshop on Data Management on New Hardware
Parallel algorithms for mining large-scale rich-media data
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Confucius and "its" intelligent disciples
Proceedings of the 18th ACM conference on Information and knowledge management
Processing web-scale multimedia data
Proceedings of the international conference on Multimedia
Mining significant least association rules using fast SLP-growth algorithm
AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Memory-efficient frequent-itemset mining
Proceedings of the 14th International Conference on Extending Database Technology
Towards improved load balancing for data intensive distributed computing
Proceedings of the 2011 ACM Symposium on Applied Computing
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A study on workload imbalance issues in data intensive distributed computing
DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems
Apriori-based frequent itemset mining algorithms on MapReduce
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Mining for insights in the search engine query stream
Proceedings of the 21st international conference companion on World Wide Web
Distributed methodology of cantree construction
MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
A conversation with Dr. Edward Y. Chang
ACM SIGKDD Explorations Newsletter
A distributed recommender system architecture
International Journal of Web Engineering and Technology
Proceedings of the WICSA/ECSA 2012 Companion Volume
Spotting trends: the wisdom of the few
Proceedings of the sixth ACM conference on Recommender systems
MapReduce algorithms for big data analysis
Proceedings of the VLDB Endowment
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce
Proceedings of the 21st ACM international conference on Information and knowledge management
GPU acceleration of probabilistic frequent itemset mining from uncertain databases
Proceedings of the 21st ACM international conference on Information and knowledge management
Parallel approaches to machine learning-A comprehensive survey
Journal of Parallel and Distributed Computing
A decentralized approach for mining event correlations in distributed system monitoring
Journal of Parallel and Distributed Computing
Computing n-gram statistics in MapReduce
Proceedings of the 16th International Conference on Extending Database Technology
Mind the gap: large-scale frequent sequence mining
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Modeling I/O interference for data intensive distributed applications
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Toward self-correcting search engines: using underperforming queries to improve search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A framework for detecting public health trends with Twitter
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Visualizing the impact of time series data for predicting user interactions
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Direct out-of-memory distributed parallel frequent pattern mining
Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Novel parallel method for mining frequent patterns on multi-core shared memory systems
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Hi-index | 0.00 |
Frequent itemset mining (FIM) is a useful tool for discovering frequently co-occurrent items. Since its inception, a number of significant FIM algorithms have been developed to speed up mining performance. Unfortunately, when the dataset size is huge, both the memory use and computational cost can still be prohibitively expensive. In this work, we propose to parallelize the FP-Growth algorithm (we call our parallel algorithm PFP) on distributed machines. PFP partitions computation in such a way that each machine executes an independent group of mining tasks. Such partitioning eliminates computational dependencies between machines, and thereby communication between them. Through empirical study on a large dataset of 802,939 Web pages and 1,021,107 tags, we demonstrate that PFP can achieve virtually linear speedup. Besides scalability, the empirical study demonstrates that PFP to be promising for supporting query recommendation for search engines.