An experimental performance evaluation of Touchstone Delta Concurrent File System
ICS '93 Proceedings of the 7th international conference on Supercomputing
The galley parallel file system
ICS '96 Proceedings of the 10th international conference on Supercomputing
The Nexus approach to integrating multithreading and communication
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Dynamic file-access characteristics of a production parallel scientific workload
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Physical Schemas for Large Multidimensional Arrays in Scientific Computing Applications
Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management
A distributed multi-storage I/O system for data intensive scientific computing
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
High-Level Buffering for Hiding Periodic Output Cost in Scientific Simulations
IEEE Transactions on Parallel and Distributed Systems
Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO
ACM SIGOPS Operating Systems Review
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Presents the design and evaluation of MTIO (Multi-Threaded Input/Output), a multi-threaded runtime library for parallel I/O. We extend the multi-threading concept to separate the computation and I/O tasks into two separate threads of control. Multi-threading in our design permits (a) asynchronous I/O even if the underlying file system does not support asynchronous I/O; (b) copy avoidance from the I/O thread to the compute thread by sharing address space; and (c) a capability to perform collective I/O asynchronously without blocking the compute threads. Further, this paper presents techniques for collective I/O which maximize load balance and concurrency while reducing communication overhead in an integrated fashion. Performance results on an IBM SP2 for various data distributions and access patterns are presented. The results show that there is a tradeoff between the amount of concurrency in I/O and the buffer size designated for I/O, and that there is an optimal buffer size beyond which the benefits of larger requests diminish due to large communication overheads.