A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Going Beyond Integer Programming with the Omega Test to Eliminate False Data Dependences
IEEE Transactions on Parallel and Distributed Systems
A study of integrated prefetching and caching strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Automatic compiler-inserted I/O prefetching for out-of-core applications
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A trace-driven comparison of algorithms for parallel prefetching and caching
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
An extended two-phase method for accessing sections of out-of-core arrays
Scientific Programming
Informed multi-process prefetching and caching
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Compiler-based I/O prefetching for out-of-core applications
ACM Transactions on Computer Systems (TOCS)
Optimal prefetching and caching for parallel I/O sytems
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Post-pass binary adaptation for software-based speculative precomputation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Design and evaluation of compiler algorithms for pre-execution
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
An Experimental Evaluation of I/O Optimizations on Different Applications
IEEE Transactions on Parallel and Distributed Systems
Parallel Out-of-Core Cholesky and QR Factorization with POOCLAPACK
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Improving I/O response times via prefetching and storage system reorganization
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
My Cache or Yours? Making Storage More Exclusive
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Discretionary Caching for I/O on Clusters
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Automatic ARIMA Time Series Modeling for Adaptive I/O Prefetching
IEEE Transactions on Parallel and Distributed Systems
ARC: A Self-Tuning, Low Overhead Replacement Cache
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
CAR: Clock with Adaptive Replacement
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
CLOCK-Pro: an effective improvement of the CLOCK replacement
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Second-tier cache management using write hints
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Managing prefetch memory for data-intensive online servers
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Karma: know-it-all replacement for a multilevel cache
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
AMP: adaptive multi-stream prefetching in a shared cache
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Competitive prefetching for concurrent sequential I/O
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
IEEE Transactions on Software Engineering
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Helper thread prefetching for loosely-coupled multiprocessor systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Effective parallelization of loops in the presence of I/O operations
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Hi-index | 0.00 |
I/O prefetching has been employed in the past as one of the mechanisms to hide large disk latencies. However, I/O prefetching in parallel applications is problematic when multiple CPUs share the same set of disks due to the possibility that prefetches from different CPUs can interact on shared memory caches in the I/O nodes in complex and unpredictable ways. In this paper, we (i) quantify the impact of compiler-directed I/O prefetching - developed originally in the context of sequential execution - on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings benefits, its effectiveness reduces significantly as the number of CPUs is increased; (ii) identify inter-CPU misses due to harmful prefetches as one of the main sources for this reduction in performance with the increased number of CPUs; and (iii) propose and experimentally evaluate a profiler and compiler assisted adaptive I/O prefetching scheme targeting shared storage caches. The proposed scheme obtains inter-thread data sharing information using profiling and, based on the captured data sharing patterns, divides the threads into clusters and assigns a separate (customized) I/O prefetcher thread for each cluster. In our approach, the compiler generates the I/O prefetching threads automatically. We implemented this new I/O prefetching scheme using a compiler and the PVFS file system running on Linux, and the empirical data collected clearly underline the importance of adapting I/O prefetching based on program phases. Specifically, our proposed scheme improves performance, on average, by 19.9%, 11.9% and 10.3% over the cases without I/O prefetching, with independent I/O prefetching (each CPU is performing compiler-directed I/O prefetching independently), and with one CPU prefetching (one CPU is reserved for prefetching on behalf of others), respectively, when 8 CPUs are used.