Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Hector: A Hierarchically Structured Shared-Memory Multiprocessor
Computer - Special issue on experimental research in computer architecture
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Sharlit—a tool for building optimizers
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Practical prefetching via data compression
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
Practical prefetching techniques for multiprocessor file systems
Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Hierarchical clustering: a structure for scalable multiprocessor operating system design
The Journal of Supercomputing - Special issue: trends in parallel operating systems
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
A study of integrated prefetching and caching strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A prefetching prototype for the parallel file systems on the Paragon
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Input/output characteristics of scalable parallel applications
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
PPFS: a high performance portable parallel file system
ICS '95 Proceedings of the 9th international conference on Supercomputing
Automatic optimization of communication in compiling out-of-core stencil codes
ICS '96 Proceedings of the 10th international conference on Supercomputing
A trace-driven comparison of algorithms for parallel prefetching and caching
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
HFS: a performance-oriented flexible file system based on building-block compositions
ACM Transactions on Computer Systems (TOCS)
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Automatic I/O hint generation through speculative execution
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Compiler directed memory management policy for numerical programs
Proceedings of the tenth ACM symposium on Operating systems principles
Virtual memory versus file interfaces for large, memory-intensive scientific applications
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
ELFSR0: object-oriented extensible file systems
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Prefetching in File Systems for MIMD Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Page Prefetching Based on Fault History
USENIX MACH III Symposium
Compiler support for out-of-core arrays on parallel machines
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
ViC*: A Compiler for Virtual-Memory C*
HIPS '98 Proceedings of the High-Level Parallel Programming Models and Supportive Environments
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Scalability in the XFS file system
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Predicting file system actions from prior events
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Profile-guided I/O partitioning
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Adaptive paging for a multifrontal solver
Proceedings of the 18th annual international conference on Supercomputing
Enabling autonomic behavior in systems software with hot swapping
IBM Systems Journal
Benchmarking the CLI for I/O-Intensive Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Mining block correlations to improve storage performance
ACM Transactions on Storage (TOS)
The performance impact of kernel prefetching on buffer cache replacement algorithms
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
C-Miner: Mining Block Correlations in Storage Systems
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality
Proceedings of the International Symposium on Code Generation and Optimization
Energy-aware data prefetching for multi-speed disks
Proceedings of the 3rd conference on Computing frontiers
K42: an infrastructure for operating system research
ACM SIGOPS Operating Systems Review
Program-counter-based pattern classification in buffer caching
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms
IEEE Transactions on Computers
RACE: A Robust Adaptive Caching Strategy for Buffer Cache
IEEE Transactions on Computers
On the design of a new Linux readahead framework
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Supporting Huge Address Spaces in a Virtual Machine for Java on a Cluster
Languages and Compilers for Parallel Computing
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Rethinking FTP: Aggressive block reordering for large file transfers
ACM Transactions on Storage (TOS)
A Prefetching Algorithm for Multi-speed Disks
Transactions on High-Performance Embedded Architectures and Compilers I
Dynamic Chunking for Out-of-Core Volume Visualization Applications
ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
Reducing seek overhead with application-directed prefetching
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Dual-layered file cache on cc-NUMA system
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
C-Miner: mining block correlations in storage systems
FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Using blocks correlations to improve the i/o performance of large network storage system
ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
The practice of i/o optimizations for out-of-core computation
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Effective parallelization of loops in the presence of I/O operations
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Optimal multi-image processing streaming framework on parallel heterogeneous systems
EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
vTube: efficient streaming of virtual appliances over last-mile networks
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.01 |
Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve “out-of-core” problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme the compiler provides the crucial information on future access patterns without burdening the programmer; the operating system supports nonbinding prefetch and release hints for managing I/O; and the operating systems cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively insert prefetches ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We implemented our compiler analysis within the SUIF compiler, and used it to target implementations of our run-time and OS support on both research and commercial systems (Hurricane and IRIX 6.5, respectively). Our experimental results show large performance gains for out-of-core scientific applications on both systems: more than 50% of the I/O stall time has been eliminated in most cases, thus translating into overall speedups of roughly twofold in many cases.