Synthesizing Efficient Out-of-Core Programs for Block Recursive Algorithms Using Block-Cyclic Data Distributions

Authors:
Zhiyong Li;John H. Reif;Sandeep K. S. Gupta
Affiliations:
IBM, Research Triangle Park, NC;Duke Univ., Durham, NC;Colorado State Univ., Ft. Collins
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1999

Citing 14
Cited 0

Vector models for data-parallel computing

Vector models for data-parallel computing
A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures

Circuits, Systems, and Signal Processing
Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
Multilinear algebra and parallel programming

The Journal of Supercomputing
Virtual memory for data-parallel computing

Virtual memory for data-parallel computing
Efficient transposition algorithms for large matrices

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compilation of out-of-core data parallel programs for distributed memory machines

ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms

Synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms
Parallel Algorithm Derivation and Program Transformation

Parallel Algorithm Derivation and Program Transformation
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A Methodology for Generating Efficient Disk-Based Algorithms from Tensor Product Formulas

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Integrating Theory and Practice in Parallel File Systems

Integrating Theory and Practice in Parallel File Systems
Computational models and program synthesis for parallel out-of-core computation

Computational models and program synthesis for parallel out-of-core computation

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper, we present a framework for synthesizing I/O efficient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver's two-level memory model in which data can be distributed using various $cyclic(B)$ distributions in contrast to the normally used physical track distribution $cyclic(B_d)$, where $B_d$ is the physical disk block size. We first introduce tensor bases to capture the semantics of block-cyclic data distributions of out-of-core data and also data access patterns to out-of-core data. We then present program generation techniques for tensor products and matrix transposition. We accurately represent the number of parallel I/O operations required for the synthesized programs for tensor products and matrix transposition as a function of tensor bases and data distributions. We introduce an algorithm to determine the data distribution which optimizes the performance of the synthesized programs. Further, we formalize the procedure of synthesizing efficient out-of-core programs for tensor product formulas with various block-cyclic distributions as a dynamic programming problem. We demonstrate the effectiveness of our approach through several examples. We show that the choice of an appropriate data distribution can reduce the number of passes to access out-of-core data by as large as eight times for a tensor product and the dynamic programming approach can largely reduce the number of passes to access out-of-core data for the overall tensor product formulas.