Prefetch throttling and data pinning for improving performance of shared caches

Authors:
Ozcan Ozturk;Seung Woo Son;Mahmut Kandemir;Mustafa Karakoy
Affiliations:
Bilkent University;Pennsylvania State University;Pennsylvania State University;Imperial College
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 36
Cited 0

The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A trace-driven comparison of algorithms for parallel prefetching and caching

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
An extended two-phase method for accessing sections of out-of-core arrays

Scientific Programming
Informed multi-process prefetching and caching

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
On the existence of a spectrum of policies that subsumes the least recently used (LRU) and least frequently used (LFU) policies

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimal prefetching and caching for parallel I/O sytems

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Parallel Out-of-Core Cholesky and QR Factorization with POOCLAPACK

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Improving I/O response times via prefetching and storage system reorganization

COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
My Cache or Yours? Making Storage More Exclusive

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Discretionary Caching for I/O on Clusters

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Automatic ARIMA Time Series Modeling for Adaptive I/O Prefetching

IEEE Transactions on Parallel and Distributed Systems
A data locality optimizing algorithm

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Data Mining Methods and Models

Data Mining Methods and Models
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
CAR: Clock with Adaptive Replacement

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
SARC: sequential prefetching in adaptive replacement cache

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
CLOCK-Pro: an effective improvement of the CLOCK replacement

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Second-tier cache management using write hints

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Managing prefetch memory for data-intensive online servers

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Karma: know-it-all replacement for a multilevel cache

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
AMP: adaptive multi-stream prefetching in a shared cache

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Working Sets Past and Present

IEEE Transactions on Software Engineering
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we (i) quantify the impact of compiler-directed I/O prefetching on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings some benefits, its effectiveness reduces significantly as the number of clients (compute nodes) is increased; (ii) identify interclient misses due to harmful I/O prefetches as one of the main sources for this reduction in performance with increased number of clients; and (iii) propose and experimentally evaluate prefetch throttling and data pinning schemes to improve performance of I/O prefetching. Prefetch throttling prevents one or more clients from issuing further prefetches if such prefetches are predicted to be harmful, i.e., replace from the memory cache the useful data accessed by other clients. Data pinning on the other hand makes selected data blocks immune to harmful prefetches by pinning them in the memory cache. We show that these two schemes can be applied in isolation or combined together, and they can be applied at a coarse or fine granularity. Our experiments with these two optimizations using four disk-intensive applications reveal that they can improve performance by 9.7% and 15.1% on average, over standard compiler-directed I/O prefetching and no-prefetch case, respectively, when 8 clients are used.