Towards improved load balancing for data intensive distributed computing

Authors:
Sven Groot;Kazuo Goda;Masaru Kitsuregawa
Affiliations:
The University of Tokyo, Komaba Meguro-ku, Tokyo, Japan;The University of Tokyo, Komaba Meguro-ku, Tokyo, Japan;The University of Tokyo, Komaba Meguro-ku, Tokyo, Japan
Venue:
Proceedings of the 2011 ACM Symposium on Applied Computing
Year:
2011

Citing 14
Cited 0

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Run-Time Load Balancing System on SAN-connected PC Cluster for Dynamic Injection of CPU and Disk Resource - A Case Study of Data Mining Application

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pfp: parallel fp-growth for query recommendation

Proceedings of the 2008 ACM conference on Recommender systems
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
Towards automatic optimization of MapReduce programs

Proceedings of the 1st ACM symposium on Cloud computing
A Comparative Study into Distributed Load Balancing Algorithms for Cloud Computing

WAINA '10 Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops
Towards optimizing hadoop provisioning in the cloud

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
A study on workload imbalance issues in data intensive distributed computing

DNIS'10 Proceedings of the 6th international conference on Databases in Networked Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Specialized frameworks for highly scalable data processing continue to gain prominence over traditional databases in many environments including the cloud. Perhaps the most well-known such framework is Google MapReduce, which has gained wide-spread popularity. However, the MapReduce model offers some significant challenges for workload balancing which have not been adequately explored so far. In this paper, we introduce techniques for improving load balancing -- particularly multi-stage jobs and dynamic partition assignment -- by using a modified programming model that offers greater flexibility but maintains the simplicity, scalability and fault tolerance of MapReduce. We then explore the effectiveness of our approach using a parallel frequent itemset mining algorithm.