Discretionary Caching for I/O on Clusters

  • Authors:
  • Murali Vilayannur;Anand Sivasubramaniam;Mahmut Kandemir;Rajeev Thakur;Robert Ross

  • Affiliations:
  • Department of Computer Science and Engineering, Pennsylvania State University 16802;Department of Computer Science and Engineering, Pennsylvania State University 16802;Department of Computer Science and Engineering, Pennsylvania State University 16802;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne 60439;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne 60439

  • Venue:
  • Cluster Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

I/O bottlenecks are already a problem in many large-scale applications that manipulate huge datasets. This problem is expected to get worse as applications get larger, and the I/O subsystem performance lags behind processor and memory speed improvements. At the same time, off-the-shelf clusters of workstations are becoming a popular platform for demanding applications due to their cost-effectiveness and widespread deployment. Caching I/O blocks is one effective way of alleviating disk latencies, and there can be multiple levels of caching on a cluster of workstations.Previous studies have shown the benefits of caching--whether it be local to a particular node, or a shared global cache across the cluster--for certain applications. However, we show that while caching is useful in some situations, it can hurt performance if we are not careful about what to cache and when to bypass the cache. This paper presents compilation techniques and runtime support to address this problem. These techniques are implemented and evaluated on an experimental Linux/Pentium cluster running a parallel file system. Our results using a diverse set of applications (scientific and commercial) demonstrate the benefits of a discretionary approach to caching for I/O subsystems on clusters, providing as much as 48% savings in overall execution time over indiscriminately caching everything in some applications.