Efficient Representation Scheme for Multidimensional Array Operations

Authors:
Chun-Yuan Lin;Jen-Shiuh Liu;Yeh-Ching Chung
Affiliations:
Feng Chia Univ., Taiwan, P.R. China;Feng Chia Univ., Taiwan, P.R. China;Feng Chia Univ., Taiwan, P.R. China
Venue:
IEEE Transactions on Computers
Year:
2002

Citing 28
Cited 6

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Balancing processor loads and exploiting data locality in N-body simulations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing

Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
A compiler algorithm for optimizing locality in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Modeling set associative caches behavior for irregular computations

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Compiling parallel code for sparse matrix applications

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1

Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Hierarchical tiling for improved superscalar performance

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A Singular Loop Transformation Framework Based on Non-Singular Matrices

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
A Relational Approach to the Compilation of Sparse Matrix Programs

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Cache Misses Prediction for High Performance Sparse Algorithms

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Integrating Loop and Data Transformations for Global Optimisation

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Automatic Analytical Modeling for the Estimation of Cache Misses

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Efficient Parallel Algorithms for Multi-Dimensional Matrix Operations'

ISPAN '00 Proceedings of the 2000 International Symposium on Parallel Architectures, Algorithms and Networks
The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations

IWPC '99 Proceedings of the 7th International Workshop on Program Comprehension
Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Unifying Data and Control Transformations for Distributed Shared Memory Machines

Unifying Data and Control Transformations for Distributed Shared Memory Machines
Cache Probabilistic Modeling for Basic Sparse Algebra Kernels Involving Matrices with a Non Uniform Distribution

EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1

Efficient Data Parallel Algorithms for Multidimensional Array Operations Based on the EKMR Scheme for Distributed Memory Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Efficient Data Compression Methods for Multidimensional Sparse Array Operations Based on the EKMR Scheme

IEEE Transactions on Computers
Efficient Data Distribution Schemes for EKMR-Based Sparse Arrays on Distributed Memory Multicomputers

The Journal of Supercomputing
Data distribution schemes of sparse arrays on distributed memory multicomputers

The Journal of Supercomputing
Fast indexing for blocked array layouts to reduce cache misses

International Journal of High Performance Computing and Networking
DSiMCluster: A Simulation Model for Efficient Memory Analysis Experiments of DSM Clusters

Simulation

Quantified Score

Hi-index	14.98

Visualization

Abstract

Array operations are used in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, etc. To implement these array operations efficiently, many methods have been proposed in the literature. However, the majority of these methods are focused on the two-dimensional arrays. When extended to higher dimensional arrays, these methods usually do not perform well. Hence, designing efficient algorithms for multidimensional array operations becomes an important issue. In this paper, we propose a new scheme, extended Karnaugh map representation (EKMR), for the multidimensional array representation. The main idea of the EKMR scheme is to represent a multidimensional array by a set of two-dimensional arrays. Hence, efficient algorithm design for multidimensional array operations becomes less complicated. To evaluate the proposed scheme, we design efficient algorithms for multidimensional array operations, matrix-matrix addition/subtraction and matrix-matrix multiplications, based on the EKMR and the traditional matrix representation (TMR) schemes. Both theoretical analysis and experimental test for these array operations were conducted. Since Fortran 90 provides a rich set of intrinsic functions for multidimensional array operations, in the experimental test, we also compare the performance of intrinsic functions provided by the Fortran 90 compiler and those based on the EKMR scheme. The experimental results show that the algorithms based on the EKMR scheme outperform those based on the TMR scheme and those provided by the Fortran 90 compiler.