TomusBlobs: Towards Communication-Efficient Storage for MapReduce Applications in Azure

  • Authors:
  • Radu Tudoran;Alexandru Costan;Gabriel Antoniu;Hakan Soncu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The emergence of cloud computing brought the opportunity to use large-scale compute infrastructures for a broad spectrum of applications and users. As the cloud paradigm gets attractive for the " elasticity'' in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latencies and low performance of cloud storage services. Enabling high-throughput massive data processing on cloud data becomes a critical issue, as it impacts the overall application performance. In this paper we address the above challenge at the level of the cloud storage. We introduce a concurrency-optimized data storage system which federates the virtual disks associated to VMs. We demonstrate the performance of our solution for efficient data-intensive processing on commercial clouds by building an optimized prototype MapReduce framework for Azure that leverages the benefits of our storage solution. We perform extensive synthetic benchmarks as well as experiments with real-world applications: they demonstrate that our solution brings substantial benefits to data intensive applications compared to approaches relying on state-of-the-art cloud object storage.