Efficient Data Parallel Algorithms for Multidimensional Array Operations Based on the EKMR Scheme for Distributed Memory Multicomputers

Authors:
Chun-Yuan Lin;Yeh-Ching Chung;Jen-Shiuh Liu
Affiliations:
-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2003

Citing 39
Cited 3

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
High performance Fortran language specification (part III)

ACM SIGPLAN Fortran Forum
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Balancing processor loads and exploiting data locality in N-body simulations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing

Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelization techniques for sparse matrix applications

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient support of parallel sparse computation for array intrinsic functions of Fortran 90

ICS '98 Proceedings of the 12th international conference on Supercomputing
Modeling set associative caches behavior for irregular computations

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
An Optimal Index Reshuffle Algorithm for Multidimensional Arrays and Its Applications for Parallel Architectures

IEEE Transactions on Parallel and Distributed Systems
Efficient Representation Scheme for Multidimensional Array Operations

IEEE Transactions on Computers
Tuning Strassen's matrix multiplication for memory efficiency

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Compiling parallel code for sparse matrix applications

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90

The Journal of Supercomputing
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1

Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Hierarchical tiling for improved superscalar performance

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A Singular Loop Transformation Framework Based on Non-Singular Matrices

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Minimal Data Dependence Abstractions for Loop Transformations

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
A Relational Approach to the Compilation of Sparse Matrix Programs

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Cache Misses Prediction for High Performance Sparse Algorithms

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Integrating Loop and Data Transformations for Global Optimisation

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Automatic Analytical Modeling for the Estimation of Cache Misses

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations

IWPC '99 Proceedings of the 7th International Workshop on Program Comprehension
Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Cache Probabilistic Modeling for Basic Sparse Algebra Kernels Involving Matrices with a Non Uniform Distribution

EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1

Efficient Data Distribution Schemes for EKMR-Based Sparse Arrays on Distributed Memory Multicomputers

The Journal of Supercomputing
Data distribution schemes of sparse arrays on distributed memory multicomputers

The Journal of Supercomputing
Parallel computations in mixed formats

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, in this paper, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, the column, and the 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions, All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases.