On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

Authors:
W. Shang;J. A. B. Fortes
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1992

Citing 18
Cited 26

Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays

IEEE Transactions on Computers
Regular interactive algorithms and their implementations on processor arrays

Regular interactive algorithms and their implementations on processor arrays
Theory of linear and integer programming

Theory of linear and integer programming
Guest Editors' Introduction: Systolic Arrays-From Concept to Implementation

Computer
VLSI array processors

VLSI array processors
Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems

IEEE Transactions on Computers
A design methodology for synthesizing parallel algorithms and architectures

Journal of Parallel and Distributed Computing
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
Scheduling, partitioning and mapping of uniform dependence algorithms on processor arrays

Scheduling, partitioning and mapping of uniform dependence algorithms on processor arrays
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)
The parallel execution of DO loops

Communications of the ACM
Parallel Computers Two: Architecture, Programming and Algorithms

Parallel Computers Two: Architecture, Programming and Algorithms
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays

IEEE Transactions on Parallel and Distributed Systems
Conflict-Free Scheduling of Nested Loop Algorithms on Lower Dimensional Processor Arrays

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Automatic synthesis of systolic arrays from uniform recurrent equations

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Multiprocessors: discussion of some theoretical and practical problems

Multiprocessors: discussion of some theoretical and practical problems
Bit-Serial Parallel Processing Systems

IEEE Transactions on Computers

Compiler techniques for maximizing fine-grain and coarse-grain parallelism in loops with uniform dependences

ICS '94 Proceedings of the 8th international conference on Supercomputing
Optimal Synthesis of Algorithm-Specific Lower-Dimensional Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
A Modular Systolic Linearization of the Warshall-Floyd Algorithm

IEEE Transactions on Parallel and Distributed Systems
Computing Programs Containing Band Linear Recurrences on Vector Supercomputers

IEEE Transactions on Parallel and Distributed Systems
Designing a Scalable Processor Array for Recurrent Computations

IEEE Transactions on Parallel and Distributed Systems
An Approach to Designing Modular Extensible Linear Arrays for Regular Algorithms

IEEE Transactions on Computers
An Approach to Checking Link Conflicts in the Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Computers
Distributed Memory Parallel Architecture Based on Modular Linear Arrays for 2-D Separable Transforms Computation

Journal of VLSI Signal Processing Systems - Parallel VLSI architectures for image and video processing
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling Functions for Spatiotemporal Mapping of d-Dimensional Algorithms with Homogeneous Dependences on (d-2)-Dimensional Parallel Architectures

Cybernetics and Systems Analysis
A Method for Parallelizing Algorithms by Vector Scheduling Functions

Programming and Computing Software
Processor Array Synthesis from Shift-Variant Deep Nested Do Loops

The Journal of Supercomputing
On Loop Transformations for Generalized Cycle Shrinking

IEEE Transactions on Parallel and Distributed Systems
Loop Transformation Using Nonunimodular Matrices

IEEE Transactions on Parallel and Distributed Systems
Collection-Aware Optimum Sequencing of Operations and Closed-Form Solutions for the Distribution of a Divisible Load on Arbitrary Processor Trees

IEEE Transactions on Parallel and Distributed Systems
Bit-level two's complement matrix multiplication

Integration, the VLSI Journal
Two-Dimensional Scheduling of Algorithms with Uniform Dependencies

PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
A Comparison of FPGA Implementations of Bit-Level and Word-Level Matrix Multipliers

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Hexagonal systolic arrays for matrix multiplication

Highly parallel computaions
Mapping rectangular mesh algorithms onto asymptotically space-optimal arrays

Journal of Parallel and Distributed Computing
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules

Journal of Parallel and Distributed Computing
An architecture for the estimation of higher order cumulants

ICASSP '93 Proceedings of the Acoustics, Speech, and Signal Processing, 1993. ICASSP-93 Vol 4., 1993 IEEE International Conference on - Volume 04
Computing all-pairs shortest paths on a linear systolic array and hardware realization on a reprogrammable FPGA platform

The Journal of Supercomputing
A reindexing based approach towards mapping of DAG with affine schedules onto parallel embedded systems

Journal of Parallel and Distributed Computing
On minimizing register usage of linearly scheduled algorithms with uniform dependencies

Computer Languages, Systems and Structures
Computing transitive closure problem on linear systolic array

NAA'04 Proceedings of the Third international conference on Numerical Analysis and its Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms, or algorithms with n nested loops, are mapped into (n-1)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k-1)-dimensional arrays where k is greather than n. A computational conflict occurs if two or more computations of an algorithm are mapped into the same execution time. Based on the Hermite normal form of the mapping matrix, necessary and sufficient conditions are derived to identify mapping without computational conflicts. These conditions are used to find time mappings of n-dimensional algorithms into (k-1)-dimensional arrays, k is greather than n, without computational conflicts. For some applications, the mapping is time-optimal.