Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Designing systolic algorithms using sequential machines
IEEE Transactions on Computers - The MIT Press scientific computation series
Multigrid Algorithms on the Hypercube Multiprocessor
IEEE Transactions on Computers
VLSI array processors
SIAM Journal on Computing
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Computational Aspects of VLSI
The Extended Cube Connected Cycles: An Efficient Interconnection for Massively Parallel Systems
IEEE Transactions on Computers
Optimal Scheduling Algorithm for Distributed-Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Scheduling Multiprocessor Tasks with Genetic Algorithms
IEEE Transactions on Parallel and Distributed Systems
A duplication based compile time scheduling method for task parallelism
Compiler optimizations for scalable parallel systems
A Robust Compile Time Method for SchedulingTask Parallelism on Distributed Memory Machines
The Journal of Supercomputing
The Hierarchical Hypercube: A New Interconnection Topology for Massively Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Optimal Processor Assignment for a Class of Pipelined Computations
IEEE Transactions on Parallel and Distributed Systems
Multiprocessor scheduling under precedence constraints: polyhedral results
Discrete Applied Mathematics - Special issue: IV ALIO/EURO workshop on applied combinatorial optimization
Multiprocessor scheduling under precedence constraints: Polyhedral results
Discrete Applied Mathematics - Special issue: IV ALIO/EURO workshop on applied combinatorial optimization
Hi-index | 0.00 |
Consideration is given to the problem of mapping systolic array algorithms into efficient algorithms for a fixed-size hypercube architecture. The authors describe in detail several optimal implementations of algorithms given for one-way one- and two-dimensional systolic arrays. Since interprocessor communication is many times slower than local computation in parallel computers built to date, the problem of efficient communication is specifically addressed for these mappings. In order to validate the technique experimentally, five systolic algorithms were mapped in various ways onto a 64-node NCUBE/7 MIMD hypercube machine. The algorithms are for the following problems: the shuffle scheduling problem, finite impulse response filtering, linear context-free language recognition, matrix multiplication, and computing the Boolean transitive closure. Experimental evidence indicates that good performance is obtained for the mappings.