Distributed Results Checking for MapReduce in Volunteer Computing

  • Authors:
  • Mircea Moca;Gheorghe Cosmin Silaghi;Gilles Fedak

  • Affiliations:
  • -;-;-

  • Venue:
  • IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce is a promising approach to support data-intensive applications on Volunteer Computing Systems. Existent middleware like Bit Dew allows running MapReduce applications in a Desktop Grid environment. If the Desktop Grid is deployed in the Internet under the Volunteer Computing paradigm, it harnesses untrustable, volatile and heterogeneous resources and the results produced by MapReduce applications can be subject of sabotage. However, the implementation of large-scale MapReduce presents significant challenges with respect to the state of the art in Desktop Grid. A key issue is the design of the result certification, an operation needed to verify that malicious volunteers do not tamper with the results of computations. Because the volume of data produced and processed is so large that cannot be sent back to the server, the result certification cannot be centralized as it is currently implemented in Desktop Grid systems. In this paper we present a distributed result checker based on the Majority Voting method. We evaluate the efficiency of our approach using a model for characterizing errors and sabotage in the MapReduce paradigm. With this model, we can compute the aggregated probability with which a MapReduce implementation produces an erroneous result. The challenge is to capture the aggregated probability for the entire system, composed from probabilities resulted from the two phases of computation: Map and Reduce. We provide a detailed analysis on the performance of the result verification method and also discuss the generated overhead of managing security. We also give guidelines about how the result verification phase should be configured, given a MapReduce application.