Compiler-based I/O prefetching for out-of-core applications

Authors:
Angela Demke Brown;Todd C. Mowry;Orran Krieger
Affiliations:
Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA;Computer Science Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2001

Citing 32
Cited 29

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Hector: A Hierarchically Structured Shared-Memory Multiprocessor

Computer - Special issue on experimental research in computer architecture
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Sharlit—a tool for building optimizers

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Practical prefetching via data compression

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
High-performance I/O for massively parallel computers: problems and prospects

Computer
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
Practical prefetching techniques for multiprocessor file systems

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Hierarchical clustering: a structure for scalable multiprocessor operating system design

The Journal of Supercomputing - Special issue: trends in parallel operating systems
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
A study of integrated prefetching and caching strategies

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A prefetching prototype for the parallel file systems on the Paragon

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Input/output characteristics of scalable parallel applications

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
PPFS: a high performance portable parallel file system

ICS '95 Proceedings of the 9th international conference on Supercomputing
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
A trace-driven comparison of algorithms for parallel prefetching and caching

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
HFS: a performance-oriented flexible file system based on building-block compositions

ACM Transactions on Computer Systems (TOCS)
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Automatic I/O hint generation through speculative execution

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Compiler directed memory management policy for numerical programs

Proceedings of the tenth ACM symposium on Operating systems principles
Virtual memory versus file interfaces for large, memory-intensive scientific applications

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
ELFSR0: object-oriented extensible file systems

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Prefetching in File Systems for MIMD Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Page Prefetching Based on Fault History

USENIX MACH III Symposium
Compiler support for out-of-core arrays on parallel machines

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
ViC*: A Compiler for Virtual-Memory C*

HIPS '98 Proceedings of the High-Level Parallel Programming Models and Supportive Environments
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Scalability in the XFS file system

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Predicting file system actions from prior events

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Profile-guided I/O partitioning

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Adaptive paging for a multifrontal solver

Proceedings of the 18th annual international conference on Supercomputing
Enabling autonomic behavior in systems software with hot swapping

IBM Systems Journal
Benchmarking the CLI for I/O-Intensive Computing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Mining block correlations to improve storage performance

ACM Transactions on Storage (TOS)
The performance impact of kernel prefetching on buffer cache replacement algorithms

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
C-Miner: Mining Block Correlations in Storage Systems

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality

Proceedings of the International Symposium on Code Generation and Optimization
Energy-aware data prefetching for multi-speed disks

Proceedings of the 3rd conference on Computing frontiers
K42: an infrastructure for operating system research

ACM SIGOPS Operating Systems Review
Program-counter-based pattern classification in buffer caching

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms

IEEE Transactions on Computers
RACE: A Robust Adaptive Caching Strategy for Buffer Cache

IEEE Transactions on Computers
On the design of a new Linux readahead framework

ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Supporting Huge Address Spaces in a Virtual Machine for Java on a Cluster

Languages and Compilers for Parallel Computing
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Rethinking FTP: Aggressive block reordering for large file transfers

ACM Transactions on Storage (TOS)
A Prefetching Algorithm for Multi-speed Disks

Transactions on High-Performance Embedded Architectures and Compilers I
Dynamic Chunking for Out-of-Core Volume Visualization Applications

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
Reducing seek overhead with application-directed prefetching

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Dual-layered file cache on cc-NUMA system

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
C-Miner: mining block correlations in storage systems

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Using blocks correlations to improve the i/o performance of large network storage system

ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
The practice of i/o optimizations for out-of-core computation

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Effective parallelization of loops in the presence of I/O operations

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Optimal multi-image processing streaming framework on parallel heterogeneous systems

EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
vTube: efficient streaming of virtual appliances over last-mile networks

Proceedings of the 4th annual Symposium on Cloud Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme the compiler provides the crucial information on future access patterns without burdening the programmer; the operating system supports nonbinding prefetch and release hints for managing I/O; and the operating systems cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively insert prefetches ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We implemented our compiler analysis within the SUIF compiler, and used it to target implementations of our run-time and OS support on both research and commercial systems (Hurricane and IRIX 6.5, respectively). Our experimental results show large performance gains for out-of-core scientific applications on both systems: more than 50% of the I/O stall time has been eliminated in most cases, thus translating into overall speedups of roughly twofold in many cases.