A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP

Authors:
Karl-Heinz Zimmermann
Affiliations:
Department of Electrical and Computer Engineering, Technical University Hamburg-Harburg, 21071 Hamburg, Germany
Venue:
Journal of VLSI Signal Processing Systems
Year:
1997

Citing 29
Cited 4

Spacetime representations of computational structures

Computing
Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays

IEEE Transactions on Computers
Regular interactive algorithms and their implementations on processor arrays

Regular interactive algorithms and their implementations on processor arrays
Theory of linear and integer programming

Theory of linear and integer programming
VLSI array processors

VLSI array processors
The systematic design of systolic arrays

Centre National de Recherche Scientifique on Automata networks in computer science: theory and applications
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Quasi-regular arrays: definition and design methodology

Systolic array processors
Mapping uniform recurrences onto small size arrays

PARLE '91 Proceedings on Parallel architectures and languages Europe : volume I: parallel architectures and algorithms: volume I: parallel architectures and algorithms
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
The design and analysis of algorithms

The design and analysis of algorithms
Regular partitioning for synthesizing fixed-size systolic arrays

Integration, the VLSI Journal
Optimization of Computation Time for Systolic Arrays

IEEE Transactions on Computers
Quasi-linear allocation functions for efficient array design

Journal of VLSI Signal Processing Systems - Special issue: 1990 Workshop on VLSI signal processing
Calculus of space-optimal mappings of systolic algorithms on processor arrays

Journal of VLSI Signal Processing Systems - Special issue: application specific array processors
On partitioning and fault tolerance issues for neural array processors

Journal of VLSI Signal Processing Systems - Special issue on VLSI neural networks
Systematic generation of linear allocation functions in systolic array design

Journal of VLSI Signal Processing Systems
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Linear mappings of n-dimensional uniform recurrences onto k-dimensional systolic arrays

Journal of VLSI Signal Processing Systems
Finding Space-Time Transformations for Uniform Recurrences viaBranching Parametric Linear Programming

Journal of VLSI Signal Processing Systems
On Time Optimal Implementation of Uniform Recurrences onto Array Processors via Quadratic Programming

Journal of VLSI Signal Processing Systems
The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)
The parallel execution of DO loops

Communications of the ACM
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Introduction to VLSI Systems

Introduction to VLSI Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking

IEEE Transactions on Parallel and Distributed Systems
Automatic synthesis of systolic arrays from uniform recurrent equations

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

A Systolic Design Methodology with Application toFull-Search Block-Matching Architectures

Journal of VLSI Signal Processing Systems
Automatic Mapping of System of N-Dimensional Affine Recurrence Equations (SARE) onto Distributed Memory Parallel Systems

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
A special purpose array processor architecture for the molecular dynamics simulation of point-mutated proteins

Journal of VLSI Signal Processing Systems - Special issue on signal processing and neural networks for bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Various methods for the synthesis of systolic arrays from signal andimage processing algorithms have been developed in the past fewyears. In this paper, we propose a technique for the partitioningproblem, the problem to synthesize systolic arrays whose size doesnot match the problem size. Our technique generalizes most of theknown lattice-based approaches to the partitioning problem andcombines the multiprojection method for the synthesis of systolicarrays with the locally sequential-globally parallel (LSGP) andlocally parallel-globally sequential (LPGS) partitioning schemes.Starting from (1) a k-dimensional large-size systolicarray obtained from a system of n-dimensional uniformrecurrences by a space-time transformation and (2) an arbitrarylattice in k-space inducing a partitioning of the arrayinto subarrays, a small-size systolic array with a scalar-valuedsystem clock is constructed via the LSGP or LPGS paradigm. Inparticular, the allocation function for the small-size array can bewritten in closed form and the timing function is obtained fromtiming functions for the subdomains, the set of operations performedby the subarrays, by simple greedy algorithms. In this way, theproblem of finding optimal timing functions can in various cases bereduced to finding optimal timing functions for the subdomains. Forproblems of large size, these greedy algorithms seem to be preferablewhen compared with existing integer or non-convex programmingformulations for finding (sub-)optimal timing functions. We alsoprovide some new results, a necessary and sufficient condition forthe existence of counter data flow, a formal relationship betweenpartitionings of processor space and index space of the uniformrecurrences in terms of counter data flow, and the structuralequivalence between the lattice-based LSGP and LPGS schemes appliedto the partitioning of index and processor space.