A Quantitative Code Analysis of Scientific Systolic Programs: DSP vs. Matrix Algorithms

Authors:
Affiliations:
Venue:
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Year:
1998

Citing 6
Cited 0

Modified Faddeeva Algorithm for Concurrent Execution of Linear Algebraic Operations

IEEE Transactions on Computers
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Systolic Parallel Processing

Systolic Parallel Processing
Why Systolic Architectures?

Computer

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider systolic programs of the most common DSP (convolution, FIR, IIR, FFT) and Matrix (multiplication, triangularisation, linear equation solving, modified Faddeev algorithm) algorithms, executed on systolic arrays of various topologies (linear, 2D mesh, hexagonal). We examine the algorithm-specific parameters (number of I/O paths, unit delays) and program-dependent parameters (program length, data location requirements, basic block lengths, branch behaviour, instruction usage, computation to communication ratio) of our program set, executed on a single processing-cell of systolic arrays. The analysis is based on the static object code.We found that basic block lengths are 17.1 (DSP) and 8.4 (Matrix) instructions long. The Divide/Square Root operations play a major role in Matrix algorithms (more than 15% of the weighted instruction set). Inter-cell communication must be efficient, since the computation to communication ratio is only 1.2 - 1.4 and is orders of magnitude smaller than in typical MIMD applications.