Parallel program design: a foundation
Parallel program design: a foundation
Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
Supercomputers: algorithms, architectures, and scientific computation
Speed and area tradeoffs in cluster-based FPGA architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
Guarded commands, nondeterminacy and formal derivation of programs
Communications of the ACM
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys (CSUR)
Asim: A Performance Model Framework
Computer
A preliminary architecture for a basic data-flow processor
ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
Transport-Triggering versus Operation-Triggering
CC '94 Proceedings of the 5th International Conference on Compiler Construction
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A Characterization of Processor Performance in the vax-11/780
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Deterministic parallel processing
International Journal of Parallel Programming
ACM Transactions on Computer Systems (TOCS)
Revisiting sorting for GPGPU stream architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dynamically Specialized Datapaths for energy efficient computing
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Q100: the architecture and design of a database processing unit
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
In this paper, we present triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication traffic. The approach provides a unified mechanism to avoid over-serialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading, which each require distinct hardware mechanisms in a traditional sequential architecture. Our analysis shows that a triggered-instruction based spatial accelerator can achieve 8X greater area-normalized performance than a traditional general-purpose processor. Further analysis shows that triggered control reduces the number of static and dynamic instructions in the critical paths by 62% and 64% respectively over a program-counter style spatial baseline, resulting in a speedup of 2.0X.