Evaluating the suitability of mapreduce for surface temperature analysis codes

  • Authors:
  • Vinay Sudhakaran;Neil P. Chue Hong

  • Affiliations:
  • University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom

  • Venue:
  • Proceedings of the second international workshop on Data intensive computing in the clouds
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Processing large volumes of scientific data requires an efficient and scalable parallel computing framework to obtain meaningful information quickly. In this paper, we evaluate a scientific application from the environmental sciences for its suitability to use the MapReduce framework. We consider cccgistemp -- a Python reimplementation of the original NASA GISS model for estimating global temperature change -- which takes land and ocean temperature records from different sites, removes duplicate records, and adjusts for urbanisation effects before calculating the 12 month running mean global temperature. The application consists of several stages, each displaying differing characteristics, and three stages have been ported to use Hadoop with the mrjob library. We note performance bottlenecks encountered while porting and suggest possible solutions, including modification of data access patterns to overcome uneven distribution of input data.