AMP: adaptive multi-stream prefetching in a shared cache

Authors:
Binny S. Gill;Luis Angel D. Bathen
Affiliations:
IBM Almaden Research Center;IBM Almaden Research Center
Venue:
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Year:
2007

Citing 0
Cited 17

TaP: table-based prefetching for storage caches

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Prefetching with adaptive cache culling for striped disk arrays

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Prefetch throttling and data pinning for improving performance of shared caches

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Memory resource allocation for file system prefetching: from a supply chain management perspective

Proceedings of the 4th ACM European conference on Computer systems
RPP: reference pattern based prefetching controller

Proceedings of the 2009 ACM symposium on Applied Computing
NCQ vs. I/O scheduler: Preventing unexpected misbehaviors

ACM Transactions on Storage (TOS)
Using machine learning techniques to enhance the performance of an automatic backup and recovery system

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Computation mapping for multi-level storage cache hierarchies

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Improving host swapping using adaptive prefetching and paging notifier

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Management of Multilevel, Multiclient Cache Hierarchies with Application Hints

ACM Transactions on Computer Systems (TOCS)
On Urgency of I/O Operations

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Compiler-directed file layout optimization for hierarchical storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A Prefetching Scheme Exploiting both Data Layout and Access History on Disk

ACM Transactions on Storage (TOS)
Compiler-directed file layout optimization for hierarchical storage systems

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.01

Visualization

Abstract

Prefetching is a widely used technique in modern data storage systems. We study the most widely used class of prefetching algorithms known as sequential prefetching. There are two problems that plague the state-of-the-art sequential prefetching algorithms: (i) cache pollution, which occurs when prefetched data replaces more useful prefetched or demand-paged data, and (ii) prefetch wastage, which happens when prefetched data is evicted from the cache before it can be used. A sequential prefetching algorithm can have a fixed or adaptive degree of prefetch and can be either synchronous (when it can prefetch only on a miss), or asynchronous (when it can also prefetch on a hit). To capture these distinctions we define four classes of prefetching algorithms: Fixed Synchronous (FS), Fixed Asynchronous (FA), Adaptive Synchronous (AS), and Adaptive Asynchronous (AA). We find that the relatively unexplored class of AA algorithms is in fact the most promising for sequential prefetching. We provide a first formal analysis of the criteria necessary for optimal throughput when using an AA algorithm in a cache shared by multiple steady sequential streams. We then provide a simple implementation called AMP, which adapts accordingly leading to near optimal performance for any kind of sequential workload and cache size. Our experimental set-up consisted of an IBM xSeries 345 dual processor server running Linux using five SCSI disks. We observe that AMP convincingly outperforms all the contending members of the FA, FS, and AS classes for any number of streams, and over all cache sizes. As anecdotal evidence, in an experiment with 100 concurrent sequential streams and varying cache sizes, AMP beats the FA, FS, and AS algorithms by 29-172%, 12-24%, and 21-210% respectively while outperforming OBL by a factor of 8. Even for complex workloads like SPC1-Read, AMP is consistently the best performing algorithm. For the SPC2 Video-on-Demand workload, AMP can sustain at least 25% more streams than the next best algorithm. Finally, for a workload consisting of short sequences, where optimality is more elusive, AMP is able to outperform all the other contenders in overall performance.