ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing

  • Authors:
  • Hui Jin;Xi Yang;Xian-He Sun;Ioan Raicu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDCS '12 Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The MapReduce programming paradigm is gaining more and more popularity recently due to its merits of ease of programming, data distribution and fault tolerance. The low barrier of adoption of MapReduce makes it a promising framework for non-dedicated distributed computing environments. However, the variability of hosts resources and availability could substantially degrade the performance of MapReduce applications. The replication-based fault tolerance mechanism helps to alleviate some problems at the cost of inefficient storage space utilization. Intelligent solutions that guarantee the performance of MapReduce applications with low data replication degree are needed to promote the idea of running MapReduce applications in non-dedicated environment at lower costs. In this research, we propose an Availability-aware Data Placement (ADAPT) strategy to improve the application performance without extra storage cost. The basic idea of ADAPT is to dispatch data based on the availability of each node, reduce network traffic, improve data locality, and optimize the application performance. We implement the prototype of ADAPT within the Hadoop framework, an open-source implementation of MapReduce. The performance of ADAPT is evaluated in an emulated non-dedicated distributed environment. The experimental results show that ADAPT can improve the performance by more than 30%. ADAPT achieves high reliability without the need for additional data replication. ADAPT has also been evaluated for large-scale computing environment through simulations, with promising results.