A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
GASS: a data movement and access service for wide area computing systems
Proceedings of the sixth workshop on I/O in parallel and distributed systems
OceanStore: an architecture for global-scale persistent storage
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Predicting Sporadic Grid Data Transfers
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
The parallel I/O architecture of the high-performance storage system (HPSS)
MSS '95 Proceedings of the 14th IEEE Symposium on Mass Storage Systems
The Kangaroo Approach to Data Movement on the Grid
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Storage resource managers: essential components for the Grid
Grid resource management
Stork: Making Data Placement a First Class Citizen in the Grid
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
A Power-Aware Run-Time System for High-Performance Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Co-scheduling of computation and data on computer clusters
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Coupling prefix caching and collective downloads for remote dataset access
Proceedings of the 20th annual international conference on Supercomputing
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
IEEE Communications Magazine
Timely offloading of result-data in HPC centers
Proceedings of the 22nd annual international conference on Supercomputing
DIMM: a distributed metadata management for data-intensive HPC environments
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
/scratch as a cache: rethinking HPC center scratch storage
Proceedings of the 23rd international conference on Supercomputing
Hi-index | 0.00 |
Procurement and the optimized utilization of Petascale supercomputers and centers is a renewed national priority. Sustained performance and availability of such large centers is a key technical challenge significantly impacting their usability. Storage systems are known to be the primary fault source leading to data unavailability and job resubmissions. This results in reduced center performance, partially due to the lack of coordination between I/O activities and job scheduling. In this work, we propose the coordination of job scheduling with data staging/offloading and on-demand staged data reconstruction to address the availability of job input data and to improve center-wide performance. Fundamental to both mechanisms is the efficient management of transient data: in the way it is scheduled and recovered. Collectively, from a center's standpoint, these techniques optimize resource usage and increase its data/service availability. From a user's standpoint, they reduce the job turnaround time and optimize the allocated time usage.