Design and Evaluation of primitives for Parallel I/O
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The Galley parallel file system
Parallel Computing - Special double issue: parallel I/O
A digital fountain approach to reliable distribution of bulk data
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
On implementing MPI-IO portably and with high performance
Proceedings of the sixth workshop on I/O in parallel and distributed systems
GASS: a data movement and access service for wide area computing systems
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Parallel I/O for high performance computing
Parallel I/O for high performance computing
ICS '02 Proceedings of the 16th international conference on Supercomputing
Squirrel: a decentralized peer-to-peer web cache
Proceedings of the twenty-first annual symposium on Principles of distributed computing
PC-OPT: Optimal Offline Prefetching and Caching for Parallel I/O Systems
IEEE Transactions on Computers
Data Sieving and Collective I/O in ROMIO
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
A Network-Aware Distributed Storage Cache for Data Intensive Environments
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
The parallel I/O architecture of the high-performance storage system (HPSS)
MSS '95 Proceedings of the 14th IEEE Symposium on Mass Storage Systems
Enabling Network-Aware Applications
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Storage resource managers: essential components for the Grid
Grid resource management
Optimal File-Bundle Caching Algorithms for Data-Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Kosha: A Peer-to-Peer Enhancement for the Network File System
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
The entropia virtual machine for desktop grids
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
The Globus Striped GridFTP Framework and Server
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
FreeLoader: Scavenging Desktop Storage Resources for Scientific Data
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
RFS: efficient and flexible remote file access for MPI-IO
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
The Composite Endpoint Protocol (CEP): scalable endpoints for terabit flows
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
Recovering transient data: automated on-demand data reconstruction and offloading for supercomputers
ACM SIGOPS Operating Systems Review
Optimizing center performance through coordinated data staging, scheduling and recovery
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
/scratch as a cache: rethinking HPC center scratch storage
Proceedings of the 23rd international conference on Supercomputing
Hi-index | 0.00 |
Scientific datasets are typically archived at mass storage systems or data centers close to supercomputers/instruments. End-users of these datasets, however, usually perform parts of their workflows at their local computers. In such cases, client-side caching can offer significant gains by reducing the cost of wide-area data movement.Scientific data caches, however, traditionally cache entire data-sets, which may not be necessary. In this paper, we propose a novel combination of prefix caching and collective download. Prefix caching allows the bootstrapping of dataset downloads by caching only a prefix of the dataset, while collective download facilitates efficient parallel patching of the missing suffix from an external data source. To estimate the optimal prefix size, we further present an analytical model that considers both the initial download over-head and the downloading speed. We implemented our proposed approach in the FreeLoader distributed cache prototype. Experimental results (using multiple scientific data repositories and data transfer tools, as well as a real-world scientific dataset access trace) demonstrate that prefix caching and collective download can be implemented efficiently, our model can select an appropriate prefix size, and the cache hit rate can be improved significantly without hurting the local access rate of cached datasets.