Discretionary Caching for I/O on Clusters

Authors:
Murali Vilayannur;Anand Sivasubramaniam;Mahmut Kandemir;Rajeev Thakur;Robert Ross
Affiliations:
Department of Computer Science and Engineering, Pennsylvania State University 16802;Department of Computer Science and Engineering, Pennsylvania State University 16802;Department of Computer Science and Engineering, Pennsylvania State University 16802;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne 60439;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne 60439
Venue:
Cluster Computing
Year:
2006

Citing 29
Cited 1

A status report on research in transparent informed prefetching

ACM SIGOPS Operating Systems Review
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Parallel file systems for the IBM SP computers

IBM Systems Journal
A model and compilation strategy for out-of-core data parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Integrated parallel prefetching and caching

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Requirements of I/O systems for parallel machines: an application-driven study

Requirements of I/O systems for parallel machines: an application-driven study
Run-Time Cache Bypassing

IEEE Transactions on Computers
Optimal prefetching and caching for parallel I/O sytems

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Storage-Aware Caching: Revisiting Caching for Heterogeneous Storage Systems

FAST '02 Proceedings of the Conference on File and Storage Technologies
DPFS: A Distributed Parallel File System

ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
Language, compiler and parallel database support for I/O intensive applications

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
My Cache or Yours? Making Storage More Exclusive

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Kernel-Level Caching for Optimizing I/O by Exploiting Inter-Application Data Sharing

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Compiler support for out-of-core arrays on parallel machines

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Collective Buffering: Improving Parallel I/O Performance

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Automatic classification of input/output access patterns

Automatic classification of input/output access patterns
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Predicting file system actions from prior events

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4

Remote MPI-I/O on parallel virtual file system using a circular buffer for high throughput

International Journal of Computers and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

I/O bottlenecks are already a problem in many large-scale applications that manipulate huge datasets. This problem is expected to get worse as applications get larger, and the I/O subsystem performance lags behind processor and memory speed improvements. At the same time, off-the-shelf clusters of workstations are becoming a popular platform for demanding applications due to their cost-effectiveness and widespread deployment. Caching I/O blocks is one effective way of alleviating disk latencies, and there can be multiple levels of caching on a cluster of workstations.Previous studies have shown the benefits of caching--whether it be local to a particular node, or a shared global cache across the cluster--for certain applications. However, we show that while caching is useful in some situations, it can hurt performance if we are not careful about what to cache and when to bypass the cache. This paper presents compilation techniques and runtime support to address this problem. These techniques are implemented and evaluated on an experimental Linux/Pentium cluster running a parallel file system. Our results using a diverse set of applications (scientific and commercial) demonstrate the benefits of a discretionary approach to caching for I/O subsystems on clusters, providing as much as 48% savings in overall execution time over indiscriminately caching everything in some applications.