A model and compilation strategy for out-of-core data parallel programs

Authors:
Rajesh Bordawekar;Alok Choudhary;Ken Kennedy;Charles Koelbel;Michael Paleczny
Affiliations:
NPAC and ECE Dept. Syracuse University;NPAC and ECE Dept. Syracuse University;CRPC and CS Dept., Rice University;CRPC and CS Dept., Rice University;CRPC and CS Dept., Rice University
Venue:
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1995

Citing 5
Cited 35

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
Disk-directed I/O for MIMD Multiprocessors

Disk-directed I/O for MIMD Multiprocessors
Improving the performance of virtual memory computers.

Improving the performance of virtual memory computers.

Tuning the performance of I/O-intensive parallel applications

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
An interprocedural framework for placement of asynchronous I/O operations

ICS '96 Proceedings of the 10th international conference on Supercomputing
An I/O network architecture of the distributed shared-memory massively parallel computer JUMP-1

ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations

Proceedings of the fifth workshop on I/O in parallel and distributed systems
A General Interprocedural Framework for Placement of Split-Phase Large Latency Operations

IEEE Transactions on Parallel and Distributed Systems
Compiling object-oriented data intensive applications

Proceedings of the 14th international conference on Supercomputing
A novel application development environment for large-scale scientific computations

Proceedings of the 14th international conference on Supercomputing
A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations

IEEE Transactions on Parallel and Distributed Systems
Compiler-Directed Collective-I/O

IEEE Transactions on Parallel and Distributed Systems
An Experimental Evaluation of I/O Optimizations on Different Applications

IEEE Transactions on Parallel and Distributed Systems
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets

The Journal of Supercomputing
Data management for large-scale scientific computations in high performance distributed systems

Cluster Computing
Data parallel language and compiler support for data intensive applications

Parallel Computing - Parallel data-intensive algorithms and applications
An Experimental Evaluation of I/O Optimizations on Different Applications

IEEE Transactions on Parallel and Distributed Systems
Compiler-Directed I/O Optimization

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Data Access Reorganizations in Compiling Out-of-Core Data Parallel Programs on Distributed Memory Machines

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Compiling Data Intensive Applications with Spatial Coordinates

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Parallel Input/Output with Heterogeneous Disks

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
A Collective I/O Scheme Based on Compiler Analysis

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Toward Compiler Support for Scalable Parallelism Using Multipartitioning

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
References

Sourcebook of parallel computing
Performance modeling and optimization of parallel out-of-core tensor contractions

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MHETA Execution Model for Heterogeneous Clusters

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Discretionary Caching for I/O on Clusters

Cluster Computing
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
A parallel dynamic programming algorithm on a multi-core architecture

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Compiler and middleware support for scalable data mining

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Computation mapping for multi-level storage cache hierarchies

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Optimal multi-image processing streaming framework on parallel heterogeneous systems

EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
Compiler-directed file layout optimization for hierarchical storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows processors independent access to their own data and a corresponding compilation strategy that integrates data-parallel computation with data distribution for out-of-core problems. Our results compare several communication methods and I/O optimizations using two out-of-core problems, Jacobi iteration and LU factorization.