The structure of parafrase-2: an advanced parallelizing compiler for C and FORTRAN
Selected papers of the second workshop on Languages and compilers for parallel computing
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A status report on research in transparent informed prefetching
ACM SIGOPS Operating Systems Review
Design and Evaluation of primitives for Parallel I/O
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Practical prefetching techniques for multiprocessor file systems
Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Efficient organization and access of multi-dimensional datasets on tertiary storage systems
Information Systems - Special issue: scientific databases
A model and compilation strategy for out-of-core data parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Omega Library interface guide
The Omega Library interface guide
Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The Vesta parallel file system
ACM Transactions on Computer Systems (TOCS)
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Automatic optimization of communication in compiling out-of-core stencil codes
ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Collective parallel I/O
Disk-directed I/O for MIMD multiprocessors
ACM Transactions on Computer Systems (TOCS)
An extended two-phase method for accessing sections of out-of-core arrays
Scientific Programming
A linear algebra framework for static High Performance Fortran code distribution
Scientific Programming - Special issue: High Performance Fortran comes of age
The Galley parallel file system
Parallel Computing - Special double issue: parallel I/O
Data distribution support on distributed shared memory multiprocessors
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Microprocessor file system interfaces
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
A case for using MPI's derived datatypes to improve I/O performance
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Loop Parallelization
Parallel I/O Subsystems in Massively Parallel Supercomputers
IEEE Parallel & Distributed Technology: Systems & Technology
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution
IEEE Transactions on Parallel and Distributed Systems
Language, compiler and parallel database support for I/O intensive applications
HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Compilation Approach for Fortran 90D/ HPF Compilers
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Page Prefetching Based on Fault History
USENIX MACH III Symposium
Compiler support for out-of-core arrays on parallel machines
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
ViC*: A Compiler for Virtual-Memory C*
ViC*: A Compiler for Virtual-Memory C*
Automatic computation and data decomposition for multiprocessors
Automatic computation and data decomposition for multiprocessors
Automatic classification of input/output access patterns
Automatic classification of input/output access patterns
Predicting file system actions from prior events
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Array organization in parallel memories
International Journal of Parallel Programming
Hi-index | 0.00 |
Current approaches to parallel I/O demand extensive user effort to obtain acceptable performance. This is in part due to difficulties in understanding the characteristics of a wide variety of I/O devices and in part due to inherent complexity of I/O software. While parallel I/O systems provide users with environments where persistent data sets can be shared between parallel processors, the ultimate performance of I/O-intensive codes depends largely on the relation between data access patterns exhibited by parallel processors and storage patterns of data in files and on disks. In cases where access patterns and storage patterns match, we can exploit parallel I/O hardware by allowing each processor to perform independent parallel I/O. In order to keep performance decent under circumstances in which data access patterns and storage patterns do not match, several I/O optimization techniques have been developed in recent years. Collective I/O is such an optimization technique that enables each processor to do I/O on behalf of other processors if doing so improves the overall performance. While it is generally accepted that collective I/O and its variants can bring impressive improvements as far as the I/O performance is concerned, it is difficult for the programmer to use collective I/O in an optimal manner. In this paper, we propose and evaluate a compiler-directed collective I/O approach which detects the opportunities for collective I/O and inserts the necessary I/O calls in the code automatically. An important characteristic of the approach is that instead of applying collective I/O indiscriminately, it uses collective I/O selectively only in cases where independent parallel I/O would not be possible or would lead to an excessive number of I/O calls. The approach involves compiler-directed access pattern and storage pattern detection schemes that work on a multiple application environment. We have implemented the necessary algorithms in a source-to-source translator and within a stand-alone tool. Our experimental results on an SGI/Cray Origin 2000 multiprocessor machine demonstrate that our compiler-directed collective I/O scheme performs very well on different setups built using nine applications from several scientific benchmarks. We have also observed that the I/O performance of our approach is only 5.23 percent worse than an optimal scheme.