Maximizing efficiency by trading storage for computation

Authors:
Ian F. Adams;Darrell D. E. Long;Ethan L. Miller;Shankar Pasupathy;Mark W. Storer
Affiliations:
University of California, Santa Cruz;University of California, Santa Cruz;University of California, Santa Cruz;NetApp;Pergamum Systems
Venue:
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Year:
2009

Citing 5
Cited 7

Trustworthy 100-year digital objects: durable encoding for when it's too late to ask

ACM Transactions on Information Systems (TOIS)
Provenance-aware storage systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Distributed Computing Economics

Queue - Object-Relational Mapping
Measurement and analysis of large-scale network file system workloads

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Price fixing in the memory market [DRAM prices]

IEEE Spectrum

On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems

Journal of Parallel and Distributed Computing
MATE-EC2: a middleware for processing data with AWS

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
The purge threat: scientists' thoughts on peta-scale usability

Proceedings of the sixth workshop on Parallel Data Storage
A data dependency based strategy for intermediate data storage in scientific cloud workflow systems

Concurrency and Computation: Practice & Experience
Analyzing compute vs. storage tradeoff for video-aware storage efficiency

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Computational caches

Proceedings of the 6th International Systems and Storage Conference
Role of acquisition intervals in private and public cloud storage costs

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, computing has meant calculating results and then storing those results for later use. Unfortunately, committing large volumes of rarely used data to storage wastes space and energy, making it a very expensive strategy. Cloud computing, with its readily available and flexibly allocatable computing resources, suggests an alternative: storing the provenance data, and means to recomputing results as needed. While computation and storage are equivalent, finding the balance between the two that maximizes efficiency is difficult. One of the fundamental challenges of this issue is rooted in the knowledge gap separating the users and the cloud administrators--neither has a completely informed view. Users have a semantic understanding of their data, while administrators have an understanding of the cloud's underlying structure. We detail the user knowledge and system knowledge needed to construct a comprehensive cost model for analyzing the trade-off between storing a result and regenerating a result, allowing users and administrators to make an informed cost-benefit analysis.