Triggered instructions: a control paradigm for spatially-programmed architectures

Authors:
Angshuman Parashar;Michael Pellauer;Michael Adler;Bushra Ahsan;Neal Crago;Daniel Lustig;Vladimir Pavlov;Antonia Zhai;Mohit Gambhir;Aamer Jaleel;Randy Allmon;Rachid Rayess;Stephen Maresh;Joel Emer
Affiliations:
Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA;Princeton University, Princeton, NJ;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA and University of Minnesota, Minneapolis, MN;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA;Intel Corporation, Hudson, MA and CSAIL, MIT, Cambridge, MA
Venue:
Proceedings of the 40th Annual International Symposium on Computer Architecture
Year:
2013

Citing 18
Cited 2

Parallel program design: a foundation

Parallel program design: a foundation
Executing a Program on the MIT Tagged-Token Dataflow Architecture

IEEE Transactions on Computers
The CMU warp processor

Supercomputers: algorithms, architectures, and scientific computation
Speed and area tradeoffs in cluster-based FPGA architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
Guarded commands, nondeterminacy and formal derivation of programs

Communications of the ACM
Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
Asim: A Performance Model Framework

Computer
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
A preliminary architecture for a basic data-flow processor

ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
Transport-Triggering versus Operation-Triggering

CC '94 Proceedings of the 5th International Conference on Compiler Construction
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A Characterization of Processor Performance in the vax-11/780

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Scaling to the End of Silicon with EDGE Architectures

Computer
Deterministic parallel processing

International Journal of Parallel Programming
The WaveScalar architecture

ACM Transactions on Computer Systems (TOCS)
Revisiting sorting for GPGPU stream architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dynamically Specialized Datapaths for energy efficient computing

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture

Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Q100: the architecture and design of a database processing unit

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication traffic. The approach provides a unified mechanism to avoid over-serialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading, which each require distinct hardware mechanisms in a traditional sequential architecture. Our analysis shows that a triggered-instruction based spatial accelerator can achieve 8X greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64% respectively over a program-counter style spatial baseline, resulting in a speedup of 2.0X.