Improving Collective I/O Performance Using Threads

Authors:
Phillip M. Dickens;Rajeev Thakur
Affiliations:
-;-
Venue:
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Year:
1999

Citing 0
Cited 12

On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Tuning high-performance scientific codes: the use of performance models to control resource usage during data migration and I/O

ICS '01 Proceedings of the 15th international conference on Supercomputing
Active buffering plus compressed migration: an integrated solution to parallel simulations' data transport needs

ICS '02 Proceedings of the 16th international conference on Supercomputing
Faster Collective Output through Active Buffering

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
High-Level Buffering for Hiding Periodic Output Cost in Scientific Simulations

IEEE Transactions on Parallel and Distributed Systems
Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO

ACM SIGOPS Operating Systems Review
Implementation and Evaluation of File Write-Back and Prefetching for MPI-IO Over GPFS

International Journal of High Performance Computing Applications
Effective nonblocking MPI-I/O in remote i/o operations using a multithreaded mechanism

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Extending scalability of collective IO through nessie and staging

Proceedings of the sixth workshop on Parallel Data Storage
Improving collective I/O performance using pipelined two-phase I/O

Proceedings of the 2012 Symposium on High Performance Computing
Memory-conscious collective I/O for extreme scale HPC systems

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Improving collective I/O performance by pipelining request aggregation and file access

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

Massively parallel computers are increasingly being used to solve large, I/O intensive applications in many different fields. For such applications, the I/O requirements quite often present a significant obstacle in the way of achieving good performance, and an important area of current research is the development of techniques by which these costs can be reduced. One such approach is collective I/O, where the processors cooperatively develop an I/O strategy that reduces the number, and increases the size, of I/O requests, making a much better use of the I/O subsystem. Collective I/O has been shown to significantly reduce the cost of performing I/O in many large, parallel applications, and for this reason serves as an important base upon which we can explore other mechanisms which can further reduce these costs. One promising approach is to use threads to perform the collective I/O in the background while the main thread continues with other computation in the foreground. In this paper, we explore the issues associated with implementing collective I/O in the background using threads.The most natural approach is to simply spawn off an I/O thread to perform the collective I/O in the background while the main thread continues with other computation. However, our research demonstrates that this approach is frequently the worst implementation option, often performing much more poorly than just executing collective I/O completely in the foreground. To improve the performance of thread-based collective I/O, we developed an alternate approach where part of the collective I/O operation is performed in the background, and part is performed in the foreground. We demonstrate that this new technique can significantly improve the performance of thread-based collective I/O, providing up to an 80% improvement over sequential collective I/O (where there is no attempt to overlap computation with I/O). Also, we discuss one very important application of this research which is the implementation of the split-collective parallel I/O operations defined in MPI 2.0.