Mammoth: autonomic data processing framework for scientific state-transition applications

  • Authors:
  • Xin Yang;Ze Yu;Min Li;Xiaolin Li

  • Affiliations:
  • University of Florida;University of Florida;University of Florida;University of Florida

  • Venue:
  • Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific computing is becoming increasingly data-intensive, and more high-impact discoveries are relying on efficient processing of big scientific data. The popular MapReduce framework such as Hadoop offers an alternative to conventional solutions (e.g., MPI or OpenMP). However, they perform moderately when processing state-transition applications. There are three key challenges: (1) these applications generate the inflated intermediate data that may saturate the network; (2) they may cause substantial synchronization overheads if not managed well; (3) dynamically evolving scientific phenomena result in heterogeneous data distributions, causing significant computation skews. In this paper, we propose Mammoth, an autonomic parallel data processing framework for scientific state-transition applications. Mammoth features a MapReduce-style programming model that is familiar to users. To address the challenges mentioned, it is further enhanced with a series of optimizations that parallelize the computation automatically and efficiently. We evaluate Mammoth via a weather prediction application with real-world datasets. The experimental evaluation demonstrates that Mammoth is competitive with the MPI-based solution and at least 30% faster than the optimized Hadoop-based solution.