Improving MPI-IO Output Performance with Active Buffering Plus Threads

Authors:
Xiaosong Ma;Marianne Winslett;Jonghyun Lee;Shengke Yu
Affiliations:
-;-;-;-
Venue:
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Year:
2003

Citing 0
Cited 18

GODIVA: Lightweight Data Management for Scientific Visualization Applications

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
High Performance Threaded Data Streaming for Large Scale Simulations

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Grid -Based Parallel Data Streaming implemented for the Gyrokinetic Toroidal Code

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Fast Parallel Non-Contiguous File Access

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High-Level Buffering for Hiding Periodic Output Cost in Scientific Simulations

IEEE Transactions on Parallel and Distributed Systems
Cooperative Client-Side File Caching for MPI Applications

International Journal of High Performance Computing Applications
Using MPI file caching to improve parallel write performance for large-scale scientific applications

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Improving Parallel Write by Node-Level Request Scheduling

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Multiple-Level MPI File Write-Back and Prefetching for Blue Gene Systems

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Implementation and Evaluation of File Write-Back and Prefetching for MPI-IO Over GPFS

International Journal of High Performance Computing Applications
Accelerating I/O Forwarding in IBM Blue Gene/P Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Cooperative write-behind data buffering for MPI i/o

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Improving collective I/O performance using pipelined two-phase I/O

Proceedings of the 2012 Symposium on High Performance Computing
Improving collective I/O performance by pipelining request aggregation and file access

Proceedings of the 20th European MPI Users' Group Meeting
ACIC: automatic cloud I/O configurator for HPC applications

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient collective output of intermediate results to secondary storage becomes more and more important for scientific simulations as the gap between processing power/interconnection bandwidth and the I/O system bandwidth enlarges. Dedicated servers can offload I/O from compute processors and shorten the execution time, but it is not always possible or easy for an application to use them. We propose the use of active buffering with threads (ABT) for overlapping I/O with computation efficiently and flexibly without dedicated I/O servers. We show that the implementation of ABT in ROMIO, a popular implementation of MPI-IO, greatly reduces the application-visible cost of ROMIO's collective write calls, and improves an application's overall performance by hiding I/O cost and saving implicit synchronization overhead from collective write operations. Further, ABT is high-level, platform-independent, and transparent to users, giving users the benefit of overlapping I/O with other processing tasks even when the file system or parallel I/O library does not support asynchronous I/O.