Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing
Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Modeling set associative caches behavior for irregular computations
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Compiling parallel code for sparse matrix applications
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Hierarchical tiling for improved superscalar performance
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A Singular Loop Transformation Framework Based on Non-Singular Matrices
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
A Relational Approach to the Compilation of Sparse Matrix Programs
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Cache Misses Prediction for High Performance Sparse Algorithms
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Automatic Analytical Modeling for the Estimation of Cache Misses
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Efficient Parallel Algorithms for Multi-Dimensional Matrix Operations'
ISPAN '00 Proceedings of the 2000 International Symposium on Parallel Architectures, Algorithms and Networks
The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations
IWPC '99 Proceedings of the 7th International Workshop on Program Comprehension
Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Unifying Data and Control Transformations for Distributed Shared Memory Machines
Unifying Data and Control Transformations for Distributed Shared Memory Machines
EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Computers
The Journal of Supercomputing
Data distribution schemes of sparse arrays on distributed memory multicomputers
The Journal of Supercomputing
Fast indexing for blocked array layouts to reduce cache misses
International Journal of High Performance Computing and Networking
Hi-index | 14.98 |
Array operations are used in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, etc. To implement these array operations efficiently, many methods have been proposed in the literature. However, the majority of these methods are focused on the two-dimensional arrays. When extended to higher dimensional arrays, these methods usually do not perform well. Hence, designing efficient algorithms for multidimensional array operations becomes an important issue. In this paper, we propose a new scheme, extended Karnaugh map representation (EKMR), for the multidimensional array representation. The main idea of the EKMR scheme is to represent a multidimensional array by a set of two-dimensional arrays. Hence, efficient algorithm design for multidimensional array operations becomes less complicated. To evaluate the proposed scheme, we design efficient algorithms for multidimensional array operations, matrix-matrix addition/subtraction and matrix-matrix multiplications, based on the EKMR and the traditional matrix representation (TMR) schemes. Both theoretical analysis and experimental test for these array operations were conducted. Since Fortran 90 provides a rich set of intrinsic functions for multidimensional array operations, in the experimental test, we also compare the performance of intrinsic functions provided by the Fortran 90 compiler and those based on the EKMR scheme. The experimental results show that the algorithms based on the EKMR scheme outperform those based on the TMR scheme and those provided by the Fortran 90 compiler.