PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Early observations on the performance of Windows Azure
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Case study for running HPC applications in public clouds
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Data Sharing Options for Scientific Workflows on Amazon EC2
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
BlobSeer: Next-generation data management for large scale infrastructures
Journal of Parallel and Distributed Computing
MapReduce in the Clouds for Science
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Windows Azure Storage: a highly available cloud storage service with strong consistency
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
The Gfarm File System on Compute Clouds
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
MapIterativeReduce: a framework for reduction-intensive data processing on azure clouds
Proceedings of third international workshop on MapReduce and its Applications Date
Future Generation Computer Systems
Hi-index | 0.00 |
The emergence of cloud computing brought the opportunity to use large-scale compute infrastructures for a broad spectrum of applications and users. As the cloud paradigm gets attractive for the " elasticity'' in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latencies and low performance of cloud storage services. Enabling high-throughput massive data processing on cloud data becomes a critical issue, as it impacts the overall application performance. In this paper we address the above challenge at the level of the cloud storage. We introduce a concurrency-optimized data storage system which federates the virtual disks associated to VMs. We demonstrate the performance of our solution for efficient data-intensive processing on commercial clouds by building an optimized prototype MapReduce framework for Azure that leverages the benefits of our storage solution. We perform extensive synthetic benchmarks as well as experiments with real-world applications: they demonstrate that our solution brings substantial benefits to data intensive applications compared to approaches relying on state-of-the-art cloud object storage.