Improving Disk Cache Hit-Ratios Through Cache Partitioning
IEEE Transactions on Computers
A model and compilation strategy for out-of-core data parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A trace-driven comparison of algorithms for parallel prefetching and caching
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Advanced compiler design and implementation
Advanced compiler design and implementation
Compilation techniques for out-of-core parallel computations
Parallel Computing - Special issues on languages and compilers for parallel computers
Automatic I/O hint generation through speculative execution
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Using MPI-2: Advanced Features of the Message Passing Interface
Using MPI-2: Advanced Features of the Message Passing Interface
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Storage-Aware Caching: Revisiting Caching for Heterogeneous Storage Systems
FAST '02 Proceedings of the Conference on File and Storage Technologies
My Cache or Yours? Making Storage More Exclusive
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
CacheCOW: providing QoS for storage system caches
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scalable Service Differentiation in a Shared Storage Cache
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Coordinated Multilevel Buffer Cache Management with Consistent Access Locality Quantification
IEEE Transactions on Computers
SARC: sequential prefetching in adaptive replacement cache
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
CLOCK-Pro: an effective improvement of the CLOCK replacement
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Managing prefetch memory for data-intensive online servers
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Program-counter-based pattern classification in buffer caching
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Implementation and performance of application-controlled file caching
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Cooperative caching: using remote client memory to improve file system performance
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Argon: performance insulation for shared storage servers
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Karma: know-it-all replacement for a multilevel cache
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
AMP: adaptive multi-stream prefetching in a shared cache
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
An implementation study of a detection-based adaptive block replacement scheme
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
On multi-level exclusive caching: offline optimality and why promotions are better than demotions
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
TaP: table-based prefetching for storage caches
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Improving I/O performance of applications through compiler-directed code restructuring
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
End-to-end performance management for scalable distributed storage
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
MC2: Multiple Clients on a Multilevel Cache
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
IBM System Blue Gene Solution: Blue Gene/P Application Development
IBM System Blue Gene Solution: Blue Gene/P Application Development
The quest for scalable support of data-intensive workloads in distributed systems
Proceedings of the 18th ACM international symposium on High performance distributed computing
I/O performance challenges at leadership scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Improving I/O performance is an important issue for many data-intensive, large-scale parallel applications. Although storage caches are used for improving I/O latencies of parallel applications, most of the prior work has focused on the management and partitioning of cache space. In particular, the compiler's role in taking advantage of multilevel storage caches has been largely unexplored. The main contribution of this paper is a shared-storage, cache-aware loop iteration distribution (iteration-to-processor mapping) scheme for I/O-intensive applications that manipulate disk-resident data sets. The proposed scheme is compiler directed and can be tuned to target any multilevel storage cache hierarchy. At the core of our scheme lies an iterative strategy that clusters loop iterations based on the underlying storage cache hierarchy and on the way these different storage caches in the hierarchy are shared by different processors. We tested this mapping scheme using a set of eight I/O-intensive application programs. The results collected so far are promising. Our proposed scheme improves the I/O performance of the tested applications by 26.3% on average, and this improvement leads to an average 18.9% reduction in the overall execution latencies of these applications. Moreover, our scheme performs significantly better than a state-of-the-art (but storage-cache- hierarchy agnostic) data locality optimization scheme. We also present an enhancement to our baseline implementation that performs local scheduling once the loop iteration distribution is performed. We observe that applying this enhancement improves I/O latency and total execution time further by 30.7% and 21.9%, respectively.