Optimal multistream sequential prefetching in a shared cache

Authors:
Binny S. Gill;Luis Angel D. Bathen
Affiliations:
IBM Almaden Research Center, San Jose, CA;University of California, Irvine, CA
Venue:
ACM Transactions on Storage (TOS)
Year:
2007

Citing 31
Cited 3

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Software support for speculative loads

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Practical prefetching via data compression

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A study of integrated prefetching and caching strategies

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Optimal prefetching via data compression

Journal of the ACM (JACM)
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A trace-driven comparison of algorithms for parallel prefetching and caching

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Compiler-directed data prefetching in multiprocessors with memory hierarchies

ICS '90 Proceedings of the 4th international conference on Supercomputing
Dependence based prefetching for linked data structures

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Prefetching Using Markov Predictors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache Memories

ACM Computing Surveys (CSUR)
Practical prefetching techniques for parallel file systems

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Hierarchical Caching and Prefetching for Continuous Media Servers with Smart Disks

IEEE Concurrency
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
PC-OPT: Optimal Offline Prefetching and Caching for Parallel I/O Systems

IEEE Transactions on Computers
Multiple Prefetch Adaptive Disk Caching

IEEE Transactions on Knowledge and Data Engineering
An adaptive sequential prefetching scheme in shared-memory multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Linear Aggressive Prefetching: A Way to Increase the Performance of Cooperative Caches

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Near-optimal parallel prefetching and caching

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Reducing Cache Pollution of Prefetching in a Small Data Cache

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
SARC: sequential prefetching in adaptive replacement cache

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
WOW: wise ordering for writes - combining spatial and temporal locality in non-volatile caches

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Reducing file system latency using a predictive approach

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Exploring the bounds of web latency reduction from caching and prefetching

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
An analytical approach to file prefetching

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference

On the design of a new Linux readahead framework

ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore

Proceedings of the 7th ACM international conference on Computing frontiers
Data structures for the most frequently used algorithm

Journal of Computing Sciences in Colleges

Quantified Score

Hi-index	0.02

Visualization

Abstract

Prefetching is a widely used technique in modern data storage systems. We study the most widely used class of prefetching algorithms known as sequential prefetching. There are two problems that plague the state-of-the-art sequential prefetching algorithms: (i) cache pollution, which occurs when prefetched data replaces more useful prefetched or demand-paged data, and (ii) prefetch wastage, which happens when prefetched data is evicted from the cache before it can be used. A sequential prefetching algorithm can have a fixed or adaptive degree of prefetch and can be either synchronous (when it can prefetch only on a miss) or asynchronous (when it can also prefetch on a hit). To capture these distinctions we define four classes of prefetching algorithms: fixed synchronous (FS), fixed asynchronous (FA), adaptive synchronous (AS), and adaptive asynchronous (AsynchA). We find that the relatively unexplored class of AsynchA algorithms is in fact the most promising for sequential prefetching. We provide a first formal analysis of the criteria necessary for optimal throughput when using an AsynchA algorithm in a cache shared by multiple steady sequential streams. We then provide a simple implementation called AMP (adaptive multistream prefetching) which adapts accordingly, leading to near-optimal performance for any kind of sequential workload and cache size. Our experimental setup consisted of an IBM xSeries 345 dual processor server running Linux using five SCSI disks. We observe that AMP convincingly outperforms all the contending members of the FA, FS, and AS classes for any number of streams and over all cache sizes. As anecdotal evidence, in an experiment with 100 concurrent sequential streams and varying cache sizes, AMP surpasses the FA, FS, and AS algorithms by 29--172%, 12--24%, and 21--210%, respectively, while outperforming OBL by a factor of 8. Even for complex workloads like SPC1-Read, AMP is consistently the best-performing algorithm. For the SPC2 video-on-demand workload, AMP can sustain at least 25% more streams than the next best algorithm. Furthermore, for a workload consisting of short sequences, where optimality is more elusive, AMP is able to outperform all the other contenders in overall performance. Finally, we implemented AMP in the state-of-the-art enterprise storage system, the IBM system storage DS8000 series. We demonstrated that AMP dramatically improves performance for common sequential and batch processing workloads and delivers up to a twofold increase in the sequential read capacity.