Compiler-Directed Collective-I/O

Authors:
Mahmut Kandemir
Affiliations:
Pennsylvania State Univ., University Park
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2001

Citing 41
Cited 1

The structure of parafrase-2: an advanced parallelizing compiler for C and FORTRAN

Selected papers of the second workshop on Languages and compilers for parallel computing
A static performance estimator to guide data partitioning decisions

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Branch prediction for free

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A status report on research in transparent informed prefetching

ACM SIGOPS Operating Systems Review
Design and Evaluation of primitives for Parallel I/O

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
High-performance I/O for massively parallel computers: problems and prospects

Computer
Practical prefetching techniques for multiprocessor file systems

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Efficient organization and access of multi-dimensional datasets on tertiary storage systems

Information Systems - Special issue: scientific databases
A model and compilation strategy for out-of-core data parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Omega Library interface guide

The Omega Library interface guide
Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The Vesta parallel file system

ACM Transactions on Computer Systems (TOCS)
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Collective parallel I/O

Collective parallel I/O
Disk-directed I/O for MIMD multiprocessors

ACM Transactions on Computer Systems (TOCS)
An extended two-phase method for accessing sections of out-of-core arrays

Scientific Programming
A linear algebra framework for static High Performance Fortran code distribution

Scientific Programming - Special issue: High Performance Fortran comes of age
The Galley parallel file system

Parallel Computing - Special double issue: parallel I/O
Data distribution support on distributed shared memory multiprocessors

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Microprocessor file system interfaces

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
A case for using MPI's derived datatypes to improve I/O performance

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Loop Parallelization

Loop Parallelization
Parallel I/O Subsystems in Massively Parallel Supercomputers

IEEE Parallel & Distributed Technology: Systems & Technology
Passion: Optimized I/O for Parallel Applications

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for Array Redistribution

IEEE Transactions on Parallel and Distributed Systems
Language, compiler and parallel database support for I/O intensive applications

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Data Access Reorganizations in Compiling Out-of-Core Data Parallel Programs on Distributed Memory Machines

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Compilation Approach for Fortran 90D/ HPF Compilers

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Page Prefetching Based on Fault History

USENIX MACH III Symposium
Compiler support for out-of-core arrays on parallel machines

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
ViC*: A Compiler for Virtual-Memory C*

ViC*: A Compiler for Virtual-Memory C*
Automatic computation and data decomposition for multiprocessors

Automatic computation and data decomposition for multiprocessors
Automatic classification of input/output access patterns

Automatic classification of input/output access patterns
Predicting file system actions from prior events

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Array organization in parallel memories

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current approaches to parallel I/O demand extensive user effort to obtain acceptable performance. This is in part due to difficulties in understanding the characteristics of a wide variety of I/O devices and in part due to inherent complexity of I/O software. While parallel I/O systems provide users with environments where persistent data sets can be shared between parallel processors, the ultimate performance of I/O-intensive codes depends largely on the relation between data access patterns exhibited by parallel processors and storage patterns of data in files and on disks. In cases where access patterns and storage patterns match, we can exploit parallel I/O hardware by allowing each processor to perform independent parallel I/O. In order to keep performance decent under circumstances in which data access patterns and storage patterns do not match, several I/O optimization techniques have been developed in recent years. Collective I/O is such an optimization technique that enables each processor to do I/O on behalf of other processors if doing so improves the overall performance. While it is generally accepted that collective I/O and its variants can bring impressive improvements as far as the I/O performance is concerned, it is difficult for the programmer to use collective I/O in an optimal manner. In this paper, we propose and evaluate a compiler-directed collective I/O approach which detects the opportunities for collective I/O and inserts the necessary I/O calls in the code automatically. An important characteristic of the approach is that instead of applying collective I/O indiscriminately, it uses collective I/O selectively only in cases where independent parallel I/O would not be possible or would lead to an excessive number of I/O calls. The approach involves compiler-directed access pattern and storage pattern detection schemes that work on a multiple application environment. We have implemented the necessary algorithms in a source-to-source translator and within a stand-alone tool. Our experimental results on an SGI/Cray Origin 2000 multiprocessor machine demonstrate that our compiler-directed collective I/O scheme performs very well on different setups built using nine applications from several scientific benchmarks. We have also observed that the I/O performance of our approach is only 5.23 percent worse than an optimal scheme.