Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pfp: parallel fp-growth for query recommendation
Proceedings of the 2008 ACM conference on Recommender systems
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
A Comparative Study into Distributed Load Balancing Algorithms for Cloud Computing
WAINA '10 Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops
Towards optimizing hadoop provisioning in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
A study on workload imbalance issues in data intensive distributed computing
DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems
Hi-index | 0.00 |
Specialized frameworks for highly scalable data processing continue to gain prominence over traditional databases in many environments including the cloud. Perhaps the most well-known such framework is Google MapReduce, which has gained wide-spread popularity. However, the MapReduce model offers some significant challenges for workload balancing which have not been adequately explored so far. In this paper, we introduce techniques for improving load balancing -- particularly multi-stage jobs and dynamic partition assignment -- by using a modified programming model that offers greater flexibility but maintains the simplicity, scalability and fault tolerance of MapReduce. We then explore the effectiveness of our approach using a parallel frequent itemset mining algorithm.