Adaptive Online Data Compression
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Efficient end to end data exchange using configurable compression
ACM SIGOPS Operating Systems Review
Adaptive On-the-Fly Compression
IEEE Transactions on Parallel and Distributed Systems
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed
International Journal of High Performance Computing Applications
Towards efficient search on unstructured data: an intelligent-storage approach
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Technical perspective: the data center is the computer
Communications of the ACM - 50th anniversary issue: 1958 - 2008
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Opening black boxes: using semantic information to combat virtual machine image sprawl
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
Future Generation Computer Systems
Elastic management of cluster-based services in the cloud
ACDC '09 Proceedings of the 1st workshop on Automated control for datacenters and clouds
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The Eucalyptus Open-Source Cloud-Computing System
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Communications of the ACM
Elastic Site: Using Clouds to Elastically Extend Site Resources
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
High throughput data-compression for cloud storage
Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
BlobSeer: Next-generation data management for large scale infrastructures
Journal of Parallel and Distributed Computing
Using Global Behavior Modeling to Improve QoS in Cloud Data Storage Services
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Going back and forth: efficient multideployment and multisnapshotting on clouds
Proceedings of the 20th international symposium on High performance distributed computing
A hybrid local storage transfer scheme for live migration of I/O intensive workloads
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
AI-Ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Infrastructure-as-a-Service (IaaS) cloud computing has revolutionized the way we think of acquiring computational resources: it allows users to deploy virtual machines (VMs) at large scale and pay only for the resources that were actually used throughout the runtime of the VMs. This new model raises new challenges in the design and development of IaaS middleware: excessive storage costs associated with both user data and VM images might make the cloud less attractive, especially for users that need to manipulate huge data sets and a large number of VM images. Storage costs result not only from storage space utilization, but also from bandwidth consumption: in typical deployments, a large number of data transfers between the VMs and the persistent storage are performed, all under high performance requirements. This paper evaluates the trade-off resulting from transparently applying data compression to conserve storage space and bandwidth at the cost of slight computational overhead. We aim at reducing the storage space and bandwidth needs with minimal impact on data access performance. Our solution builds on BlobSeer, a distributed data management service specifically designed to sustain a high throughput for concurrent accesses to huge data sequences that are distributed at large scale. Extensive experiments demonstrate that our approach achieves large reductions (at least 40%) of bandwidth and storage space utilization, while still attaining high performance levels that even surpass the original (no compression) performance levels in several data-intensive scenarios.