Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
High performance Fortran language specification (part III)
ACM SIGPLAN Fortran Forum
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing
Numerical recipes in Fortran 90 (2nd ed.): the art of parallel scientific computing
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelization techniques for sparse matrix applications
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient support of parallel sparse computation for array intrinsic functions of Fortran 90
ICS '98 Proceedings of the 12th international conference on Supercomputing
Modeling set associative caches behavior for irregular computations
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
Efficient Representation Scheme for Multidimensional Array Operations
IEEE Transactions on Computers
Tuning Strassen's matrix multiplication for memory efficiency
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Compiling parallel code for sparse matrix applications
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90
The Journal of Supercomputing
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Hierarchical tiling for improved superscalar performance
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
A Singular Loop Transformation Framework Based on Non-Singular Matrices
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Minimal Data Dependence Abstractions for Loop Transformations
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Run-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
A Relational Approach to the Compilation of Sparse Matrix Programs
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Cache Misses Prediction for High Performance Sparse Algorithms
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Automatic Analytical Modeling for the Estimation of Cache Misses
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The SPARAMAT Approach to Automatic Comprehension of Sparse Matrix Computations
IWPC '99 Proceedings of the 7th International Workshop on Program Comprehension
Caching-Efficient Multithreaded Fast Multiplication of Sparse Matrices
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
The Journal of Supercomputing
Data distribution schemes of sparse arrays on distributed memory multicomputers
The Journal of Supercomputing
Parallel computations in mixed formats
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Hi-index | 0.00 |
Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, in this paper, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, the column, and the 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions, All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases.