Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays

Authors:
Dan I Moldovan;Jose A. B Fortes
Affiliations:
Univ. of Southern California, Los Angeles;Purdue Univ., West Lafayette, IN
Venue:
IEEE Transactions on Computers
Year:
1986

Citing 4
Cited 87

The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)
Wavefront Array Processor: Language, Architecture, and Applications

IEEE Transactions on Computers
Pin Limitations and Partitioning of VLSI Interconnection Networks

IEEE Transactions on Computers
On the Analysis and Synthesis of VLSI Algorithms

IEEE Transactions on Computers

Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems

IEEE Transactions on Computers
Matching algorithms to array processors

ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
Orderings and partition of PDE computations for a fixed-size VLSI architecture

ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
Pipelined data parallel algorithms—concept and modeling

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The symbolic hyperplane transformation for recursively defined arrays

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
On high-speed computing with a programmable linear array

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Systematic hardware adaptation of systolic algorithms

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

IEEE Transactions on Computers
Optimal Dynamic Remapping of Data Parallel Computations

IEEE Transactions on Computers
On the Design of a Unidirectional Systolic Array for Key Enumeration

IEEE Transactions on Computers
A systolic array processor for biological information signal processing

ICS '91 Proceedings of the 5th international conference on Supercomputing
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Independent Partitioning of Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Synthesis aspects in the design of efficient processor arrays from affine recurrence equations

Journal of Symbolic Computation - Special issue on automatic programming
Partitioning the statement per iteration space using non-singular matrices

ICS '93 Proceedings of the 7th international conference on Supercomputing
Sequencer-based data path synthesis of regular iterative algorithms

DAC '94 Proceedings of the 31st annual Design Automation Conference
Some New Designs of 2-D Array for Matrix Multiplication and Transitive Closure

IEEE Transactions on Parallel and Distributed Systems
Compiler technology for parallel scientific computation

Scientific Programming
Integral knapsack problems: parallel algorithms and their implementations on distributed systems

ICS '95 Proceedings of the 9th international conference on Supercomputing
A Modular Systolic Linearization of the Warshall-Floyd Algorithm

IEEE Transactions on Parallel and Distributed Systems
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
Achieving Full Parallelism Using Multidimensional Retiming

IEEE Transactions on Parallel and Distributed Systems
Optimal Data Scheduling for Uniform Multidimensional Applications

IEEE Transactions on Computers
Optimization of the background memory utilization by partitioning

ISSS '97 Proceedings of the 10th international symposium on System synthesis
Designing a Scalable Processor Array for Recurrent Computations

IEEE Transactions on Parallel and Distributed Systems
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Journal of VLSI Signal Processing Systems
Improved Compressions of Cube-Connected Cycles Networks

IEEE Transactions on Parallel and Distributed Systems
Alpha du centaur: a prototype environment for the design of parallel regular alorithms

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Comparison of several techniques for generating systolic arrays

CSC '88 Proceedings of the 1988 ACM sixteenth annual conference on Computer science
Some new systolic designs for two-dimensional convolution

CSC '88 Proceedings of the 1988 ACM sixteenth annual conference on Computer science
An Approach to Checking Link Conflicts in the Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Computers
A toroidal systolic array for Warshall's algorithm

CSC '91 Proceedings of the 19th annual conference on Computer Science
Hybrid static-dynamic communication scheduling for parallel systems

SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays

Journal of VLSI Signal Processing Systems
Optimal semi-oblique tiling

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Systolic Opportunities for Multidimensional Data Streams

IEEE Transactions on Parallel and Distributed Systems
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Journal of VLSI Signal Processing Systems
Design of Processor Arrays for Reconfigurable Architectures

The Journal of Supercomputing
On the Relationship Between Two Systolic Array Design Methodologies

IEEE Transactions on Computers
A Family of Efficient Regular Arrays for Algebraic Path Problem

IEEE Transactions on Computers
Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules

IEEE Transactions on Computers
On Mapping Systolic Algorithms onto the Hypercube

IEEE Transactions on Parallel and Distributed Systems
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays

IEEE Transactions on Parallel and Distributed Systems
Uniform Approach for Solving some Classical Problems on a Linear Array

IEEE Transactions on Parallel and Distributed Systems
Synthesizing Nested Loop Algorithms Using Nonlinear Transformation Method

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
HARP: An Open Architecture for Parallel Matrix and Signal Processing

IEEE Transactions on Parallel and Distributed Systems
Loop Transformation Using Nonunimodular Matrices

IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays

IEEE Transactions on Parallel and Distributed Systems
Knapsack on VLSI: from Algorithm to Optimal Circuit

IEEE Transactions on Parallel and Distributed Systems
Exact Partitioning of Affine Dependence Algorithms

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A Skeleton for Parallel Dynamic Programming

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Combining Serialisation and Reconfiguration for FPGA Designs

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
An Emulator for Exploring RaPiD Configurable Computing Architectures

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Loop Tiling for Reconfigurable Accelerators

FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Towards the automatic optimal mapping of pipeline algorithms

Parallel Computing
Exact partitioning of affine dependence algorithms

Embedded processor design challenges
Hexagonal systolic arrays for matrix multiplication

Highly parallel computaions
Three-dimensional orthogonal tile sizing problem: mathematical programming approach

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Scheduling in Co-Partitioned Array Architectures

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Architecture Design of Reconfigurable Pipelined Datapaths

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Combining Serialization and Reconfiguration for Convolver Designs

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Evaluation of Loop Grouping Methods Based on Orthogonal Projection Spaces

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm

IEEE Transactions on Computers
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture

IEEE Transactions on Computers
Mapping dynamic programming onto modular linear systolic arrays

Distributed Computing
Computing transitive closure on systolic arrays of fixed size

Distributed Computing
Towards systolizing compilation

Distributed Computing
Design and Evaluation of Dynamic Key Message Algorithms for Cluster Computing

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Automatic mapping of nested loops to FPGAS

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Cronus: A platform for parallel code generation based on computational geometry methods

Journal of Systems and Software
Automatic generation of a parallel tile processing unit for algorithms with non-affine array references

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
A new modular exponentiation architecture for efficient design of RSA cryptosystem

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A new algorithm for high-speed modular multiplication design

IEEE Transactions on Circuits and Systems Part I: Regular Papers
Designing of processor-time optimal systolic arrays for band matrix-vector multiplication

Computers & Mathematics with Applications
Hardware Acceleration of HMMER on FPGAs

Journal of Signal Processing Systems
Short Communication: Synthesis of algorithms on processor arrays

Parallel Computing
Short Communication: Array size anomaly of problem-size independent systolic arrays for matrix-vector multiplication

Parallel Computing
Efficient fixed-size systolic arrays for the modular multiplication

COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Geometric scheduling of 2-D UET-UCT uniform dependence loops

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High performance phylogenetic analysis with maximum parsimony on reconfigurable hardware

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Determining objective functions in systolic array designs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	15.02

Visualization

Abstract

A technique for partitioning and mapping algorithms into VLSI systolic arrays is presented in this paper. Algorithm partitioning is essential when the size of a computational problem is larger than the size of the VLSI array intended for that problem. Computational models are introduced for systolic arrays and iterative algorithms. First, we discuss the mapping of algorithms into arbitrarily large size VLSI arrays. This mapping is based on the idea of algorithm transformations. Then, we present an approach to algorithm partitioning which is also based on algorithm transformations. Our approach to the partitioning problem is to divide the algorithm index set into bands and to map these bands into the processor space. The partitioning and mapping technique developed throughout the paper is summarized as a six step procedure. A computer program implementing this procedure was developed and some results obtained with this program are presented.