Improved parallel I/O via a two-phase run-time access strategy

Authors:
Juan Miguel del Rosario;Rajesh Bordawekar;Alok Choudhary
Affiliations:
-;-;-
Venue:
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Year:
1993

Citing 1
Cited 36

Performance measurement of the concurrent file system of the Intel iPSC/2 hypercube

Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems

Parallel file systems for the IBM SP computers

IBM Systems Journal
Flexibility and performance of parallel file systems

ACM SIGOPS Operating Systems Review
The Vesta parallel file system

ACM Transactions on Computer Systems (TOCS)
Efficient data-parallel files via automatic mode detection

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
ENWRICH: a compute-processor write caching scheme for parallel file systems

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Disk-directed I/O for MIMD multiprocessors

ACM Transactions on Computer Systems (TOCS)
Implementation of collective I/O in the Intel Paragon parallel file system: initial experiences

ICS '97 Proceedings of the 11th international conference on Supercomputing
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
The impact of spatial layout of jobs on parallel I/O performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Informed prefetching of collective input/output requests

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A case for using MPI's derived datatypes to improve I/O performance

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
PDS/PIO: lightweight libraries for collective parallel I/O

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel simulation of parallel file systems and I/O programs

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Active buffering plus compressed migration: an integrated solution to parallel simulations' data transport needs

ICS '02 Proceedings of the 16th international conference on Supercomputing
Placement of I/O servers to improve parallel I/O performance on switch-based clusters

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
References

Sourcebook of parallel computing
An Adaptive Cache Coherence Protocol Specification for Parallel Input/Output Systems

IEEE Transactions on Parallel and Distributed Systems
A study of I/O methods for parallel visualization of large-scale data

Parallel Computing - Parallel graphics and visualization
Adaptive parallel I/O scheduling algorithm for multiprogrammed systems

Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
Disk-directed I/O for MIMD multiprocessors

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Noncontiguous locking techniques for parallel file systems

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scaling parallel I/O performance through I/O delegate and caching system

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Optimizing server placement for parallel I/O in switch-based clusters

Journal of Parallel and Distributed Computing
Improving Parallel Write by Node-Level Request Scheduling

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Adaptive parallel I/O scheduling algorithm for multiprogrammed systems

Future Generation Computer Systems - Parallel input/output management techniques (PIOMT) in cluster and grid computing
An expandable parallel file system using NFS servers

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Automated tracing of I/O stack

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Design and implementation of parallel file aggregation mechanism

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
A new i/o architecture for improving the performance in large scale clusters

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Compression-aware I/O performance analysis for big data clustering

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Efficient data restructuring and aggregation for I/O acceleration in PIDX

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable in situ scientific data encoding for analytical query processing

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Optimized process placement for collective I/O operations

Proceedings of the 20th European MPI Users' Group Meeting
Insights for exascale IO APIs from building a petascale IO API

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Characterization and modeling of PIDX parallel I/O for performance optimization

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable model of parallel computations for applications with intensive input-output

Journal of Computer and Systems Sciences International

Quantified Score

Hi-index	0.00

Visualization

Abstract

As scientists expand their models to describe physical phenomena of increasingly large extent, I/O becomes crucial and a system with limited I/O capacity can severely constrain the performance of the entire program.We provide experimental results, performed on an lntel Touchtone Delta and nCUBE 2 I/O system, to show that the performance of existing parallel I/O systems can vary by several orders of magnitude as a function of the data access pattern of the parallel program. We then propose a two-phase access strategy, to be implemented in a runtime system, in which the data distribution on computational nodes is decoupled from storage distribution. Our experimental results show that performance improvements of several orders of magnitude over direct access based data distribution methods can be obtained, and that performance for most data access patterns can be improved to within a factor of 2 of the best performance. Further, the cost of redistribution is a very small fraction of the overall access cost.