VLSI array processors
Communication effect basic linear algebra computations on hypercube architectures
Journal of Parallel and Distributed Computing
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Introduction to algorithms
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies
IEEE Transactions on Computers
Efficient matrix multiplication on SIMD computers
SIAM Journal on Matrix Analysis and Applications
A practical algorithm for exact array dependence analysis
Communications of the ACM
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
Mobile and replicated alignment of arrays in data-parallel programs
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Journal of Parallel and Distributed Computing
A linear algebra framework for static High Performance Fortran code distribution
Scientific Programming - Special issue: High Performance Fortran comes of age
Generalized Cannon's algorithm for parallel matrix multiplication
ICS '97 Proceedings of the 11th international conference on Supercomputing
Automatic Generation of Modular Time-Space Mappings and Data Alignments
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Loop Transformation Using Nonunimodular Matrices
IEEE Transactions on Parallel and Distributed Systems
Processor Mapping Techniques Toward Efficient Data Redistribution
Proceedings of the 8th International Symposium on Parallel Processing
Automatic Vectorization of Communications for Data-Parallel Programs
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
A cellular computer to implement the kalman filter algorithm
A cellular computer to implement the kalman filter algorithm
Systematic optimization of basic linear algebra computations for distributed-memory systems
Systematic optimization of basic linear algebra computations for distributed-memory systems
Hi-index | 0.00 |
A modular mapping consists of a linear transformation followed by modulo operations. It is characterized by a transformation matrix and a vector of moduli, called the modulus vector. Modular mappings are useful to derive parallel versions of algorithms with commutative operations and algorithms intended for execution on processor arrays with toroidal networks. In order to preserve algorithm correctness, modular mappings must be injective. Results of previous work characterize injective modular mappings of rectangular index sets. This paper provides a technique to generate modular appings that satisfy these injective conditions and extends the results to general index sets. For an n-dimensional rectangular index set, the technique has O (n^2n!) complexity. To facilitate generation of efficient code, modular mappings must also be reversible (i.e., have easily described inverses). An O(n^2) method is provided to generate reversible modular mappings. This method reduces the search space by fixing entries of the modulus vector while attempting to minimize the number of entries to exclude few solutions. For general index sets defined by linear inequalities, injectivity can be checked by formulating and solving a set of linear inequalities. A modified Fourier-Motzkin elimination is proposed to solve these inequalities. To generate an injective modular mapping of an index set defined by linear inequalities, this paper proposes a technique that attempts to minimize the values of the entries of the modulus vector. Several examples are provided to illustrate the application of the above mentioned methods, including the case of BLAS routines.