Hiding I/O latency with pre-execution prefetching for parallel applications

Authors:
Yong Chen;Surendra Byna;Xian-He Sun;Rajeev Thakur;William Gropp
Affiliations:
Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL;Illinois Institute of Technology, Chicago, IL;Argonne National Laboratory, Argonne, IL;University of Illinois Urbana-Champaign, Urbana, IL
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 25
Cited 12

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel I/O: an introduction

Parallel Computing - Special double issue: parallel I/O
Parallel I/O for high performance computing

Parallel I/O for high performance computing
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic file-access characteristics of a production parallel scientific workload

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Prefetching in File Systems for MIMD Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Learning to Classify Parallel Input/Output Access Patterns

IEEE Transactions on Parallel and Distributed Systems
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Faster Collective Output through Active Buffering

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Decoupled Architecture for Application-Specific File Prefetching

Proceedings of the FREENIX Track: 2002 USENIX Annual Technical Conference
Workload Characterization of Input/Output Intensive Parallel Applications

Proceedings of the 9th International Conference on Computer Performance Evaluation: Modelling Techniques and Tools
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Informed prefetching and caching

Informed prefetching and caching
Using speculative execution to automatically hide i/o latency

Using speculative execution to automatically hide i/o latency
Scalable Input/Output: Achieving System Balance

Scalable Input/Output: Achieving System Balance
A study of source-level compiler algorithms for automatic construction of pre-execution code

ACM Transactions on Computer Systems (TOCS)
Scalability of Heterogeneous Computing

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
High-Level Buffering for Hiding Periodic Output Cost in Scientific Simulations

IEEE Transactions on Parallel and Distributed Systems
Data-Flow Analysis for MPI Programs

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Aggressive prefetching: an idea whose time has come

HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Collective caching: application-aware client-side file caching

HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Data access history cache and associated data prefetching mechanisms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Exploring Parallel I/O Concurrency with Speculative Prefetching

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing

Multiple-Level MPI File Write-Back and Prefetching for Blue Gene Systems

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
FACT: fast communication trace collection for parallel applications through program slicing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Implementation and Evaluation of File Write-Back and Prefetching for MPI-IO Over GPFS

International Journal of High Performance Computing Applications
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A layout-aware optimization strategy for collective I/O

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Crom: Faster web browsing using speculative execution

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Architecture exploration for efficient data transfer and storage in data-parallel applications

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
A cost-intelligent application-specific data layout scheme for parallel file systems

Proceedings of the 20th international symposium on High performance distributed computing
Sprint: speculative prefetching of remote data

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
On Urgency of I/O Operations

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Eco-Storage: A Hybrid Storage System with Energy-Efficient Informed Prefetching

Journal of Signal Processing Systems
Cost-intelligent application-specific data layout optimization for parallel file systems

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel applications are usually able to achieve high computational performance but suffer from large latency in I/O accesses. I/O prefetching is an effective solution for masking the latency. Most of existing I/O prefetching techniques, however, are conservative and their effectiveness is limited by low accuracy and coverage. As the processor-I/O performance gap has been increasing rapidly, data-access delay has become a dominant performance bottleneck. We argue that it is time to revisit the "I/O wall" problem and trade the excessive computing power with data-access speed. We propose a novel pre-execution approach for masking I/O latency. We describe the pre-execution I/O prefetching framework, the pre-execution thread construction methodology, the underlying library support, and the prototype implementation in the ROMIO MPI-IO implementation in MPICH2. Preliminary experiments show that the pre-execution approach is promising in reducing I/O access latency and has real potential.