Design and implementation of a parallel I/O runtime system for irregular applications

Authors:
Jaechun No;Sung-soon Park;Jesus Carretero Perez;Alok Choudhary
Affiliations:
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois;Department of Computer Science and Engineering, Anyang University, Republic of Korea;Departamento de Informatica, Universidad Carlos III de Madrid, 28911 Leganes, Spain;Electrical and Computer Engineering, Northwestern University, Evanston, Illinois
Venue:
Journal of Parallel and Distributed Computing
Year:
2002

Citing 7
Cited 1

Design and Evaluation of primitives for Parallel I/O

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
High-performance I/O for massively parallel computers: problems and prospects

Computer
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Extensible file system (ELFS): an object-oriented approach to high performance file I/O

OOPSLA '94 Proceedings of the ninth annual conference on Object-oriented programming systems, language, and applications
An efficient abstract interface for multidimensional array I/O

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Passion: Optimized I/O for Parallel Applications

Computer
A data management approach for handling large compressed arrays in high performance computing

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)

Performance of a new CFD flow solver using a hybrid programming paradigm

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the design, implementation, and evaluation of a runtime system based on collective I/O techniques for irregular applications. The design is motivated by the requirements of a large number of science and engineering applications including teraflops applications, where the data must be reorganized into a canonical form for further processing or restarts. We present two designs: "collective I/O" and "pipelined collective I/O." In the first design, all processors participate in I/O simultaneously, making scheduling of I/O requests simpler but creating possible contention at the I/O nodes. In the second design, processors are organized into several groups so that only one group performs I/O while the next group performs the communication to rearrange data and this entire process is dynamically pipelined to reduce I/O node contention. In other words, the design provides support for dynamic contention management. We also present a software caching method using collective I/O to reduce I/O cost by reusing the data already present in the memory of other nodes. Chunking and on-line compression mechanisms are included in both models. We present performance results on the Intel Paragon at Caltech and on the ASCI/Red teraflops machine at Sandia National Laboratories.