A pattern for efficient parallel computation on multicore processors with scalar operand networks

Authors:
Henry Hoffmann;Srinivas Devadas;Anant Agarwal
Affiliations:
MIT CSAIL;MIT CSAIL;MIT CSAIL
Venue:
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Year:
2010

Citing 16
Cited 0

Work-preserving emulations of fixed-connection networks

Journal of the ACM (JACM)
iWarp: anatomy of a parallel computing system

iWarp: anatomy of a parallel computing system
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A survey of out-of-core algorithms in numerical linear algebra

External memory algorithms
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A stream compiler for communication-exposed architectures

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor

Proceedings of the 31st annual international symposium on Computer architecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
High Performance Embedded Computing Handbook

High Performance Embedded Computing Handbook
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
From Bit Level Systolic Arrays to HDTV Processor Chips

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Systolic arrays have long been used to develop custom hardware because they result in designs that are efficient and scalable. Many researchers have explored ways to exploit systolic designs in programmable processors; however, such efforts often result in the simulation of large systolic arrays on a general purpose platforms. While simulation can add flexibility and problem size independence, it comes at a cost of greatly reducing the efficiency of the original systolic approach. This paper presents a pattern for developing parallel programs using systolic designs to execute efficiently (without resorting to simulation) on modern multicore processors featuring scalar operand networks. This pattern provides a compromise solution that can achieve high efficiency and flexibility given appropriate hardware support. Several examples illustrate the application of this pattern to produce parallel implementations of matrix multiplication and convolution.