Computation mapping for multi-level storage cache hierarchies

Authors:
Mahmut Kandemir;Sai Prashanth Muralidhara;Mustafa Karakoy;Seung Woo Son
Affiliations:
Pennsylvania State University;Pennsylvania State University;Imperial College;Argonne National Laboratory
Venue:
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Year:
2010

Citing 39
Cited 2

Improving Disk Cache Hit-Ratios Through Cache Partitioning

IEEE Transactions on Computers
A model and compilation strategy for out-of-core data parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A trace-driven comparison of algorithms for parallel prefetching and caching

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Advanced compiler design and implementation

Advanced compiler design and implementation
Compilation techniques for out-of-core parallel computations

Parallel Computing - Special issues on languages and compilers for parallel computers
Automatic I/O hint generation through speculative execution

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Towards application/file-level characterization of block references: a case for fine-grained buffer management

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Using MPI-2: Advanced Features of the Message Passing Interface

Using MPI-2: Advanced Features of the Message Passing Interface
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Storage-Aware Caching: Revisiting Caching for Heterogeneous Storage Systems

FAST '02 Proceedings of the Conference on File and Storage Technologies
My Cache or Yours? Making Storage More Exclusive

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
CacheCOW: providing QoS for storage system caches

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scalable Service Differentiation in a Shared Storage Cache

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
ULC: A File Block Placement and Replacement Protocol to Effectively Exploit Hierarchical Locality in Multi-Level Buffer Caches

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Coordinated Multilevel Buffer Cache Management with Consistent Access Locality Quantification

IEEE Transactions on Computers
SARC: sequential prefetching in adaptive replacement cache

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
CLOCK-Pro: an effective improvement of the CLOCK replacement

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Managing prefetch memory for data-intensive online servers

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Program-counter-based pattern classification in buffer caching

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Implementation and performance of application-controlled file caching

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Cooperative caching: using remote client memory to improve file system performance

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Argon: performance insulation for shared storage servers

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Karma: know-it-all replacement for a multilevel cache

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
AMP: adaptive multi-stream prefetching in a shared cache

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
An implementation study of a detection-based adaptive block replacement scheme

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
On multi-level exclusive caching: offline optimality and why promotions are better than demotions

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
TaP: table-based prefetching for storage caches

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Improving I/O performance of applications through compiler-directed code restructuring

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
End-to-end performance management for scalable distributed storage

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
MC2: Multiple Clients on a Multilevel Cache

ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
IBM System Blue Gene Solution: Blue Gene/P Application Development

IBM System Blue Gene Solution: Blue Gene/P Application Development
The quest for scalable support of data-intensive workloads in distributed systems

Proceedings of the 18th ACM international symposium on High performance distributed computing
I/O performance challenges at leadership scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Compiler-directed file layout optimization for hierarchical storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

Improving I/O performance is an important issue for many data-intensive, large-scale parallel applications. Although storage caches are used for improving I/O latencies of parallel applications, most of the prior work has focused on the management and partitioning of cache space. In particular, the compiler's role in taking advantage of multilevel storage caches has been largely unexplored. The main contribution of this paper is a shared-storage, cache-aware loop iteration distribution (iteration-to-processor mapping) scheme for I/O-intensive applications that manipulate disk-resident data sets. The proposed scheme is compiler directed and can be tuned to target any multilevel storage cache hierarchy. At the core of our scheme lies an iterative strategy that clusters loop iterations based on the underlying storage cache hierarchy and on the way these different storage caches in the hierarchy are shared by different processors. We tested this mapping scheme using a set of eight I/O-intensive application programs. The results collected so far are promising. Our proposed scheme improves the I/O performance of the tested applications by 26.3% on average, and this improvement leads to an average 18.9% reduction in the overall execution latencies of these applications. Moreover, our scheme performs significantly better than a state-of-the-art (but storage-cache- hierarchy agnostic) data locality optimization scheme. We also present an enhancement to our baseline implementation that performs local scheduling once the loop iteration distribution is performed. We observe that applying this enhancement improves I/O latency and total execution time further by 30.7% and 21.9%, respectively.