Why does file system prefetching work?

Authors:
Elizabeth Shriver;Christopher Small;Keith A. Smith
Affiliations:
Information Sciences Research Center, Bell Labs, Lucent Technologies;Information Sciences Research Center, Bell Labs, Lucent Technologies;Harvard University
Venue:
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Year:
1999

Citing 26
Cited 25

A fast file system for UNIX

ACM Transactions on Computer Systems (TOCS)
Improving Disk Performance Via Latency Reduction

IEEE Transactions on Computers
Measurements of a distributed file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Input/output behavior of supercomputing applications

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The design and evaluation of RAID 5 and parity striping disk array architectures

Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
An introduction to disk drive modeling

Computer
Scheduling algorithms for modern disk drives

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A study of integrated prefetching and caching strategies

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The design and implementation of the 4.4BSD operating system

The design and implementation of the 4.4BSD operating system
An analysis of schedules for performing multi-page requests

Information Systems
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Input/output access pattern classification using hidden Markov models

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Performance modeling for realistic storage devices

Performance modeling for realistic storage devices
An analytic behavior model for disk drives with readahead caches and request reordering

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Modeling and optimizing I/O throughput of multiple disks on a bus

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A General Model for the Performance of Disk Systems

Journal of the ACM (JACM)
Building an extensible operating system

Building an extensible operating system
Disk scheduling: FCFS vs.SSTF revisited

Communications of the ACM
Exploiting global input/output access pattern classification

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Analytic Modeling of Clustered RAID with Mapping Based on Nearly Random Permutation

IEEE Transactions on Computers
The Multics Input/Output system

SOSP '71 Proceedings of the third ACM symposium on Operating systems principles
Disk-directed I/O for an out-of-core computation

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Near-optimal parallel prefetching and caching

FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Performance analysis of the RAID 5 disk array

IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
An analytical approach to file prefetching

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference

Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Evolutionary Prefetching and Caching in an Independent Storage Units Model

ADVIS '00 Proceedings of the First International Conference on Advances in Information Systems
Using Multiple Predictors to Improve the Accuracy of File Access Predictions

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
A Keyword-Based Semantic Prefetching Approach in Internet News Services

IEEE Transactions on Knowledge and Data Engineering
On windows file access modes: a performance study

WISICT '05 Proceedings of the 4th international symposium on Information and communication technologies
A stochastic approach to file access prediction

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
A performance analysis of a cache-based file prediction protocol for mobile file systems

Proceedings of the ACM international workshop on Performance monitoring, measurement, and evaluation of heterogeneous wireless and wired networks
NFS tricks and benchmarking traps

ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Managing prefetch memory for data-intensive online servers

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
I/O system performance debugging using model-driven anomaly characterization

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Towards higher disk head utilization: extracting free bandwidth from busy disk drives

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Competitive prefetching for concurrent sequential I/O

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Granite: an effective tool to explore the concepts of high-performance data intensive scientific computation in java

Journal of Computing Sciences in Colleges
Predictive data grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication

ACM Transactions on Storage (TOS)
On the design of a new Linux readahead framework

ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
CD-PAN: a protocol for peer-to-peer content distribution in a weakly connected and heterogeneous personal area network

Proceedings of the 4th Annual International Conference on Wireless Internet
A multiple-file write scheme for improving write performance of small files in Fast File System

Information Processing Letters
File access prediction using neural networks

IEEE Transactions on Neural Networks
FlashVM: revisiting the virtual memory hierarchy

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
RFS: a network file system for mobile devices and the cloud

ACM SIGOPS Operating Systems Review
Truly non-blocking writes

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
An intelligent cloud system adopting file pre-fetching

ADCONS'11 Proceedings of the 2011 international conference on Advanced Computing, Networking and Security
I/O acceleration with pattern detection

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Network-aware data caching and prefetching for cloud-hosted metadata retrieval

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most file systems attempt to predict which disk blocks will be needed in the near future and prefetch them into memory; this technique can improve application throughput as much as 50%. But why? The reasons include that the disk cache comes into play, the device driver amortizes the fixed cost of an I/O operation over a larger amount of data, total disk seek time can be decreased, and that programs can overlap computation and I/O. However, intuition does not tell us the relative benefit of each of these causes, or techniques for increasing the effectiveness of prefetching. To answer these questions, we constructed an analytic performance model for file system reads. The model is based on a 4.4BSD-derived file system, and parameterized by the access patterns of the files, layout of files on disk, and the design characteristics of the file system and of the underlying disk. We then validated the model against several simple workloads; the predictions of our model were typically within 4% of measured values, and differed at most by 9% from measured values. Using the model and experiments, we explain why and when prefetching works, and make proposals for how to tune file system and disk parameters to improve overall system throughput.