Towards improved load balancing for data intensive distributed computing

  • Authors:
  • Sven Groot;Kazuo Goda;Masaru Kitsuregawa

  • Affiliations:
  • The University of Tokyo, Komaba Meguro-ku, Tokyo, Japan;The University of Tokyo, Komaba Meguro-ku, Tokyo, Japan;The University of Tokyo, Komaba Meguro-ku, Tokyo, Japan

  • Venue:
  • Proceedings of the 2011 ACM Symposium on Applied Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Specialized frameworks for highly scalable data processing continue to gain prominence over traditional databases in many environments including the cloud. Perhaps the most well-known such framework is Google MapReduce, which has gained wide-spread popularity. However, the MapReduce model offers some significant challenges for workload balancing which have not been adequately explored so far. In this paper, we introduce techniques for improving load balancing -- particularly multi-stage jobs and dynamic partition assignment -- by using a modified programming model that offers greater flexibility but maintains the simplicity, scalability and fault tolerance of MapReduce. We then explore the effectiveness of our approach using a parallel frequent itemset mining algorithm.