stdchk: A Checkpoint Storage System for Desktop Grid Computing

Authors:
Samer Al-Kiswany;Matei Ripeanu;Sudharshan S. Vazhkudai;Abdullah Gharaibeh
Affiliations:
-;-;-;-
Venue:
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Year:
2008

Citing 0
Cited 14

StoreGPU: exploiting graphics processing units to accelerate distributed storage systems

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Configurable security for scavenged storage systems

Proceedings of the 4th ACM international workshop on Storage security and survivability
Exploring data reliability tradeoffs in replicated storage systems

Proceedings of the 18th ACM international symposium on High performance distributed computing
On GPU's viability as a middleware accelerator

Cluster Computing
Enabling High Data Throughput in Desktop Grids through Decentralized Data and Metadata Management: The BlobSeer Approach

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
PLFS: a checkpoint filesystem for parallel applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The case for a versatile storage system

ACM SIGOPS Operating Systems Review
A GPU accelerated storage system

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
VMFlock: virtual machine co-migration for the cloud

Proceedings of the 20th international symposium on High performance distributed computing
Scientific data services: a high-performance I/O system with array semantics

Proceedings of the first annual workshop on High performance computing meets databases
Cluster computing, recursion and datalog

Datalog'10 Proceedings of the First international conference on Datalog Reloaded
A study on data deduplication in HPC storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This article argues that a checkpoint storage system, optimized to operate in these environments, can offer multiple benefits: reduce the load on a traditional file system, offer high-performance through specialization, and, finally, optimize data management by taking into account checkpoint application semantics. Such a storage system can present a unifying abstraction to checkpoint operations, while hiding the fact that there are no dedicated resources to store the checkpoint data. We prototype stdchk, a checkpoint storage system that uses scavenged disk space from participating desktops to build a low-cost storage system, offering a traditional file system interface for easy integration with applications. This article presents the stdchk architecture, key performance optimizations, and its support for incremental checkpointing and increased data availability. Our evaluation confirms that the stdchk approach is viable in a desktop grid setting and offers a low cost storage system with desirable performance characteristics: high write throughput as well as reduced storage space and network effort to save checkpoint images.