Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
ACM Transactions on Computer Systems (TOCS)
Future Generation Computer Systems - Special issue on metacomputing
ARIMA time series modeling and forecasting for adaptive I/O prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
Integrated prefetching and caching in single and parallel disk systems
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
A Network-Aware Distributed Storage Cache for Data Intensive Environments
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
The parallel I/O architecture of the high-performance storage system (HPSS)
MSS '95 Proceedings of the 14th IEEE Symposium on Mass Storage Systems
The Kangaroo Approach to Data Movement on the Grid
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Stork: Making Data Placement a First Class Citizen in the Grid
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
ARC: A Self-Tuning, Low Overhead Replacement Cache
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Coupling prefix caching and collective downloads for remote dataset access
Proceedings of the 20th annual international conference on Supercomputing
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Program-counter-based pattern classification in buffer caching
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Optimizing center performance through coordinated data staging, scheduling and recovery
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Timely offloading of result-data in HPC centers
Proceedings of the 22nd annual international conference on Supercomputing
Case studies in storage access by loosely coupled petascale applications
Proceedings of the 4th Annual Workshop on Petascale Data Storage
Accelerating parallel analysis of scientific simulation data via Zazen
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Hi-index | 0.00 |
To sustain emerging data-intensive scientific applications, High Performance Computing (HPC) centers invest a notable fraction of their operating budget on a specialized fast storage system, scratch space, which is designed for storing the data of currently running and soon-to-run HPC jobs. Instead, it is often used as a standard file system, wherein users arbitrarily store their data, without any consideration to the center's overall performance. To remedy this, centers periodically scan the scratch in an attempt to purge transient and stale data.This practice of supporting a cache workload using a file system and disjoint tools for staging and purging results in suboptimal use of the scratch space. In this paper, we address the above issues by proposing a new perspective, where the HPC scratch space is treated as a cache, and data population, retention, and eviction tools are integrated with scratch management. In our approach, data is moved to the scratch space only when it is needed, and unneeded data is removed as soon as possible. We also design a new job-workflow-aware caching policy that leverages user-supplied hints for managing the cache. Our evaluation using three-year job logs from the Jaguar supercomputer, shows that compared to the widely-used purge approach, workflow-aware caching optimizes scratch utilization by reducing the average amount of data read by 9.3%, and by reducing job scheduling delays associated with data staging, on average, by 282.0%.