MRSG - A MapReduce simulator over SimGrid

  • Authors:
  • Wagner Kolberg;Pedro De B. Marcos;Julio C. S. Anjos;Alexandre K. S. Miyazaki;Claudio R. Geyer;Luciana B. Arantes

  • Affiliations:
  • Federal University of Rio Grande do Sul (UFRGS), Institute of Informatics - GPPD, Caixa Postal 15.064 - 91.501-970, Porto Alegre, RS, Brazil;Federal University of Rio Grande do Sul (UFRGS), Institute of Informatics - GPPD, Caixa Postal 15.064 - 91.501-970, Porto Alegre, RS, Brazil;Federal University of Rio Grande do Sul (UFRGS), Institute of Informatics - GPPD, Caixa Postal 15.064 - 91.501-970, Porto Alegre, RS, Brazil;Federal University of Rio Grande do Sul (UFRGS), Institute of Informatics - GPPD, Caixa Postal 15.064 - 91.501-970, Porto Alegre, RS, Brazil;Federal University of Rio Grande do Sul (UFRGS), Institute of Informatics - GPPD, Caixa Postal 15.064 - 91.501-970, Porto Alegre, RS, Brazil;Universit Pierre et Marie Curie, CNRS INRIA - REGAL, 4 Place Jussieu, 75005 Paris, France

  • Venue:
  • Parallel Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce is a parallel programming model to process large datasets, and it was inspired by the Map and Reduce primitives from functional languages. Its first implementation was designed to run on large clusters of homogeneous machines. Though, in the last years, the model was ported to different types of environments, such as desktop grid and volunteer computing. To obtain a good performance in these environments, however, it is necessary to adapt some framework mechanisms, such as scheduling and data distribution algorithms. In this paper we present the MRSG simulator, which reproduces the MapReduce work-flow on top of the SimGrid simulation toolkit, and provides an API to implement and evaluate these new algorithms and policies for MapReduce. To evaluate the simulator, we compared its behavior against a real Hadoop MapReduce deployment. The results show an important similarity between the simulated and real executions.