Automatic synthesis of systolic arrays from uniform recurrent equations

Authors:
Patrice Quinton
Affiliations:
IRISA-CNRS, Campus de Beaulieu - 35042 Rennes-Cedex, France
Venue:
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Year:
1984

Citing 1
Cited 58

The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)

Systolic algorithms to examine all pairs of elements

Communications of the ACM
A hardware accelerator for speech recognition algorithms

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems

IEEE Transactions on Computers
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
Givens elimination on systolic arrays

ICS '88 Proceedings of the 2nd international conference on Supercomputing
On high-speed computing with a programmable linear array

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A Systolic Architecture for Fast Dense Matrix Inversion

IEEE Transactions on Computers
On Mapping Algorithms to Linear and Fault-Tolerant Systolic Arrays

IEEE Transactions on Computers
Compiler optimizations for asynchronous systolic array programs

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Systematic hardware adaptation of systolic algorithms

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Matrix Computations on Systolic-Type Meshes: An Introduction to the Multimesh Graph Method

Computer
Compiling programs for a linear systolic array

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Preconditioning index set transformations for time-optimal affine scheduling

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
A framework for high level synthesis of digital architectures from u-recursive algorithms

CSC '90 Proceedings of the 1990 ACM annual conference on Cooperation
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
On Synthesizing Optimal Family of Linear Systolic Arrays for Matrix Multiplication

IEEE Transactions on Computers
Detecting static algorithms by partial evaluation

PEPM '91 Proceedings of the 1991 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Optimization of Computation Time for Systolic Arrays

IEEE Transactions on Computers
Analysis of free schedule in periodic graphs

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Synthesis aspects in the design of efficient processor arrays from affine recurrence equations

Journal of Symbolic Computation - Special issue on automatic programming
Some New Designs of 2-D Array for Matrix Multiplication and Transitive Closure

IEEE Transactions on Parallel and Distributed Systems
Thoughts on parallelism and concurrency in compiling curricula

ACM Computing Surveys (CSUR)
An Approach to Designing Modular Extensible Linear Arrays for Regular Algorithms

IEEE Transactions on Computers
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Journal of VLSI Signal Processing Systems
DECOMPOSER: a synthesizer for systolic systems

DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
An Approach to Checking Link Conflicts in the Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Computers
MUPPET—a programming environment of message-based multiprocessors

ACM '86 Proceedings of 1986 ACM Fall joint computer conference
Combined instruction and loop parallelism in array synthesis for FPGAs

Proceedings of the 14th international symposium on Systems synthesis
Systolic Opportunities for Multidimensional Data Streams

IEEE Transactions on Parallel and Distributed Systems
Configuring of Algorithms in Mapping into Hardware

The Journal of Supercomputing
A Family of New Efficient Arrays for Matrix Multiplication

IEEE Transactions on Computers
The Generation of a Class of Multipliers: Synthesizing Highly Parallel Algorithms in VLSI

IEEE Transactions on Computers
Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules

IEEE Transactions on Computers
On Mapping Systolic Algorithms onto the Hypercube

IEEE Transactions on Parallel and Distributed Systems
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays

IEEE Transactions on Parallel and Distributed Systems
Uniform Approach for Solving some Classical Problems on a Linear Array

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
A Processor-Time-Minimal Systolic Array for Cubical Mesh Algorithms

IEEE Transactions on Parallel and Distributed Systems
A Processor-Time-Minimal Systolic Array for Transitive Closure

IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays

IEEE Transactions on Parallel and Distributed Systems
Mapping Linear Recurrences onto Systolic Arrays

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Loop Tiling for Reconfigurable Accelerators

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
An introduction to processor-time-optimal systolic arrays

Highly parallel computaions
Hyper-systolic algorithms with applications in linear algebra and molecular dynamics

Highly parallel computaions
Determination of the Processor Functionality in the Design of Processor Arrays

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Affine transformations for multi-dimensional signal processing on ASIC regular arrays

EURO-DAC '91 Proceedings of the conference on European design automation
Mapping rectangular mesh algorithms onto asymptotically space-optimal arrays

Journal of Parallel and Distributed Computing
Computing transitive closure on systolic arrays of fixed size

Distributed Computing
Towards systolizing compilation

Distributed Computing
Automatic mapping of nested loops to FPGAS

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
A practical dynamic single assignment transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Mapping Homogeneous Graphs on Linear Arrays

IEEE Transactions on Computers
A reindexing based approach towards mapping of DAG with affine schedules onto parallel embedded systems

Journal of Parallel and Distributed Computing
Parallel image processing with the block data parallel architecture

IBM Journal of Research and Development
Hardware Acceleration of HMMER on FPGAs

Journal of Signal Processing Systems
Acceleration of a content-based image-retrieval application on the RDISK cluster

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Transformation to dynamic single assignment using a simple data flow analysis

APLAS'05 Proceedings of the Third Asian conference on Programming Languages and Systems

Quantified Score

Hi-index	0.06

Visualization

Abstract

We describe a systematic method for the design of systolic arrays. This method may be used for algorithms that can be expressed as a set of uniform recurrent equations over a convex set D of Cartesian coordinates. Most of the algorithms already considered for systolic implementation may be represented in this way. The methods consists of two steps: finding a timing-function for the computations that is compatible with the dependences introduced by the equations, then mapping the domain D onto another finite set of coordinates, each representing a processor of the systolic array, in such a way that concurrent computations are mapped onto different processors. The scheduling and mapping functions meet conditions that allow the full automation of the method. The method is exemplified on the convolution product and the matrix product.