An Adaptive Cache Coherence Protocol Specification for Parallel Input/Output Systems
IEEE Transactions on Parallel and Distributed Systems
Improving I/O performance of applications through compiler-directed code restructuring
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Prefetch throttling and data pinning for improving performance of shared caches
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Compiler-directed file layout optimization for hierarchical storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
I/O bottlenecks are already a problem in many large-scaleapplications that manipulate huge datasets. Thisproblem is expected to get worse as applications get larger,and the I/O subsystem performance lags behind processorand memory speed improvements. Caching I/O blocks isone effective way of alleviating disk latencies, and there canbe multiple levels of caching on a cluster of workstations.Previous studies have shown the benefits of caching -whether it be local to a particular node, or a shared globalcache across the cluster - for certain applications. However,we show that while caching is useful in some situations,it can hurt performance if we are not careful aboutwhat to cache and when to bypass the cache. This paperpresents compilation techniques and runtime support to addressthis problem. These techniques are implemented andevaluated on an experimental Linux/Pentium cluster runninga parallel file system. Our results using a diverse setof applications (scientific and commercial) demonstrate thebenefits of a discretionary approach to caching for I/O subsystemson clusters, providing as much as 33% savings overindiscriminately caching everything in some applications.