Modified Faddeeva Algorithm for Concurrent Execution of Linear Algebraic Operations
IEEE Transactions on Computers
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Systolic Parallel Processing
Computer
Hi-index | 0.00 |
In this paper we consider systolic programs of the most common DSP (convolution, FIR, IIR, FFT) and Matrix (multiplication, triangularisation, linear equation solving, modified Faddeev algorithm) algorithms, executed on systolic arrays of various topologies (linear, 2D mesh, hexagonal). We examine the algorithm-specific parameters (number of I/O paths, unit delays) and program-dependent parameters (program length, data location requirements, basic block lengths, branch behaviour, instruction usage, computation to communication ratio) of our program set, executed on a single processing-cell of systolic arrays. The analysis is based on the static object code.We found that basic block lengths are 17.1 (DSP) and 8.4 (Matrix) instructions long. The Divide/Square Root operations play a major role in Matrix algorithms (more than 15% of the weighted instruction set). Inter-cell communication must be efficient, since the computation to communication ratio is only 1.2 - 1.4 and is orders of magnitude smaller than in typical MIMD applications.