Constructing collaborative desktop storage caches for large scientific datasets

Authors:
Sudharshan S. Vazhkudai;Xiaosong Ma;Vincent W. Freeh;Jonathan W. Strickland;Nandan Tammineedi;Tyler Simon;Stephen L. Scott
Affiliations:
Oak Ridge National Laboratory, Oak Ridge, TN;North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC;Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
ACM Transactions on Storage (TOS)
Year:
2006

Citing 40
Cited 3

Andrew: a distributed personal computing environment

Communications of the ACM - The MIT Press scientific computation series
The LOCUS distributed system architecture

The LOCUS distributed system architecture
Coda: A Highly Available File System for a Distributed Workstation Environment

IEEE Transactions on Computers
The Zebra striped network file system

ACM Transactions on Computer Systems (TOCS)
Implementing global memory management in a workstation cluster

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Efficient cooperative caching using hints

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Frangipani: a scalable distributed file system

Proceedings of the sixteenth ACM symposium on Operating systems principles
A large-scale study of file-system contents

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
GASS: a data movement and access service for wide area computing systems

Proceedings of the sixth workshop on I/O in parallel and distributed systems
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Active buffering plus compressed migration: an integrated solution to parallel simulations' data transport needs

ICS '02 Proceedings of the 16th international conference on Supercomputing
Squirrel: a decentralized peer-to-peer web cache

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
SETI@home: an experiment in public-resource computing

Communications of the ACM
An end-to-end approach to globally scalable network storage

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Peer to Peer: Peering into the Future

Advanced Lectures on Networking, NETWORKING 2002 [This book presents the revised version of seven tutorials given at the NETWORKING 2002 Conference in Pisa, Italy in May 2002]
Flexibility, Manageability, and Performance in a Grid Storage Appliance

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
The parallel I/O architecture of the high-performance storage system (HPSS)

MSS '95 Proceedings of the 14th IEEE Symposium on Mass Storage Systems
Tracing a Large-Scale Peer to Peer System: An Hour in the Life of Gnutella

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Entropia: architecture and performance of an enterprise desktop grid system

Journal of Parallel and Distributed Computing - Special issue on computational grids
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Measurement, modeling, and analysis of a peer-to-peer file-sharing workload

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Measuring and Understanding User Comfort With Resource Borrowing

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Optimal File-Bundle Caching Algorithms for Data-Grids

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Kosha: A Peer-to-Peer Enhancement for the Network File System

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Ivy: a read/write peer-to-peer file system

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Pastiche: making backup cheap and easy

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Governor: Autonomic Throttling for Aggressive Idle Resource Scavenging

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Using Regression Techniques to Predict Large Data Transfers

International Journal of High Performance Computing Applications
RFS: efficient and flexible remote file access for MPI-IO

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
FAB: enterprise storage systems on a shoestring

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Explicit control a batch-aware distributed file system

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Shark: scaling file servers via cooperative caching

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4

Configurable security for scavenged storage systems

Proceedings of the 4th ACM international workshop on Storage security and survivability
The case for a versatile storage system

ACM SIGOPS Operating Systems Review
Visualizing metadata for environmental datasets

DCMI '10 Proceedings of the 2010 International Conference on Dublin Core and Metadata Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. This has led to the proliferation of high-end mass storage systems, storage area clusters, and data centers. These storage facilities offer a large range of choices in terms of capacity and access rate, as well as strong data availability and consistency support. However, for most end-users, the “last mile” in their analysis pipeline often requires data processing and visualization at local computers, typically local desktop workstations. End-user workstations---despite having more processing power than ever before---are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused.We propose the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. This article presents the FreeLoader architecture, component design, and performance results based on our proof-of-concept prototype. Its architecture comprises contributing benefactor nodes, steered by a management layer, providing services such as data integrity, high performance, load balancing, and impact control. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets by delivering higher data access rates than traditional storage facilities, namely, local or remote shared file systems, storage systems, and Internet data repositories. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation's network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled. Further, we show that security features such as data encryptions and integrity checks can be easily added as filters for interested clients. Finally, we demonstrate how legacy applications can use the FreeLoader API to store and retrieve datasets.