Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Synthesis of an Optimal Family of Matrix Multiplication Algorithms on Linear Arrays
IEEE Transactions on Computers
Regular interactive algorithms and their implementations on processor arrays
Regular interactive algorithms and their implementations on processor arrays
Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems
IEEE Transactions on Computers
The derivation of systolic implementations
Acta Informatica
Broadcast Normalization in Systolic Design
IEEE Transactions on Computers
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
On high-speed computing with a programmable linear array
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors
IEEE Transactions on Computers
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
The Generation of a Class of Multipliers: Synthesizing Highly Parallel Algorithms in VLSI
IEEE Transactions on Computers
SYSDES: A Systolic Array Automation Design System
Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Experiences with data dependence abstractions
ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiler technology for parallel scientific computation
Scientific Programming
Optimal Synthesis of Algorithm-Specific Lower-Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
Scheduling Multiprocessor Tasks with Genetic Algorithms
IEEE Transactions on Parallel and Distributed Systems
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays
Journal of VLSI Signal Processing Systems
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Method for Parallelizing Algorithms by Vector Scheduling Functions
Programming and Computing Software
Design of Processor Arrays for Reconfigurable Architectures
The Journal of Supercomputing
Uniform Approach for Solving some Classical Problems on a Linear Array
IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
Two-Dimensional Scheduling of Algorithms with Uniform Dependencies
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Structured Scheduling of Recurrence Equations: Theory and Practice
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Localization of Data Transfer in Processor Arrays
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Structured scheduling of recurrence equations: theory and practice
Embedded processor design challenges
The Journal of Supercomputing
Implementing fine grain processor arrays on field-programmable logic
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
Consideration is given to transforming depth p-nested for loop algorithms into q-dimensional systolic VLSI arrays where 1or=qor=p-1. Previously, there existed complete characterizations of correct transformation only for the cases where q=p-1 orq=1. This gap is filled by giving formal necessary and sufficient conditions for correct transformation of a p-nested loop algorithm into a q-dimensional systolic array for any q,1or=qor=p-1. Practical methods are presented. The techniques developed are applied to the automatic design of special purpose and programmable systolic arrays. The results also contribute toward automatic compilation onto more general purpose programmable arrays. Synthesis of linear and planar systolic array implementations for a three-dimensional cube-graph algorithm and a reindexed Warshall-Floyd path-finding algorithm are used to illustrate the method.