Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Regular interactive algorithms and their implementations on processor arrays
Regular interactive algorithms and their implementations on processor arrays
Theory of linear and integer programming
Theory of linear and integer programming
VLSI array processors
The systematic design of systolic arrays
Centre National de Recherche Scientifique on Automata networks in computer science: theory and applications
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Quasi-regular arrays: definition and design methodology
Systolic array processors
Mapping uniform recurrences onto small size arrays
PARLE '91 Proceedings on Parallel architectures and languages Europe : volume I: parallel architectures and algorithms: volume I: parallel architectures and algorithms
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies
IEEE Transactions on Computers
The design and analysis of algorithms
The design and analysis of algorithms
Regular partitioning for synthesizing fixed-size systolic arrays
Integration, the VLSI Journal
Optimization of Computation Time for Systolic Arrays
IEEE Transactions on Computers
Quasi-linear allocation functions for efficient array design
Journal of VLSI Signal Processing Systems - Special issue: 1990 Workshop on VLSI signal processing
Calculus of space-optimal mappings of systolic algorithms on processor arrays
Journal of VLSI Signal Processing Systems - Special issue: application specific array processors
On partitioning and fault tolerance issues for neural array processors
Journal of VLSI Signal Processing Systems - Special issue on VLSI neural networks
Systematic generation of linear allocation functions in systolic array design
Journal of VLSI Signal Processing Systems
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Linear mappings of n-dimensional uniform recurrences onto k-dimensional systolic arrays
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Introduction to VLSI Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking
IEEE Transactions on Parallel and Distributed Systems
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A Systolic Design Methodology with Application toFull-Search Block-Matching Architectures
Journal of VLSI Signal Processing Systems
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Journal of VLSI Signal Processing Systems - Special issue on signal processing and neural networks for bioinformatics
Hi-index | 0.00 |
Various methods for the synthesis of systolic arrays from signal andimage processing algorithms have been developed in the past fewyears. In this paper, we propose a technique for the partitioningproblem, the problem to synthesize systolic arrays whose size doesnot match the problem size. Our technique generalizes most of theknown lattice-based approaches to the partitioning problem andcombines the multiprojection method for the synthesis of systolicarrays with the locally sequential-globally parallel (LSGP) andlocally parallel-globally sequential (LPGS) partitioning schemes.Starting from (1) a k-dimensional large-size systolicarray obtained from a system of n-dimensional uniformrecurrences by a space-time transformation and (2) an arbitrarylattice in k-space inducing a partitioning of the arrayinto subarrays, a small-size systolic array with a scalar-valuedsystem clock is constructed via the LSGP or LPGS paradigm. Inparticular, the allocation function for the small-size array can bewritten in closed form and the timing function is obtained fromtiming functions for the subdomains, the set of operations performedby the subarrays, by simple greedy algorithms. In this way, theproblem of finding optimal timing functions can in various cases bereduced to finding optimal timing functions for the subdomains. Forproblems of large size, these greedy algorithms seem to be preferablewhen compared with existing integer or non-convex programmingformulations for finding (sub-)optimal timing functions. We alsoprovide some new results, a necessary and sufficient condition forthe existence of counter data flow, a formal relationship betweenpartitionings of processor space and index space of the uniformrecurrences in terms of counter data flow, and the structuralequivalence between the lattice-based LSGP and LPGS schemes appliedto the partitioning of index and processor space.