Spatial computation

Authors:
Mihai Budiu;Girish Venkataramani;Tiberiu Chelcea;Seth Copen Goldstein
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Year:
2004

Citing 83
Cited 22

Dataflow machine architecture

ACM Computing Surveys (CSUR)
Resource requirements of dataflow programs

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Micropipelines

Communications of the ACM
The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The RC compiler for the DTN dataflow computer

Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Dependence flow graphs: an algebraic approach to program dependencies

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The C programming language

The C programming language
Programming in VLSI: from communicating processes to delay-insensitive circuits

Developments in concurrency and communication
From control flow to dataflow

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Efficient accommodation of may-alias information in SSA form

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Handshake circuits: an asynchronous architecture for VLSI programming

Handshake circuits: an asynchronous architecture for VLSI programming
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Sparse functional stores for imperative programs

IR '95 Papers from the 1995 ACM SIGPLAN workshop on Intermediate representations
Software pipelining

ACM Computing Surveys (CSUR)
How much non-strictness do lenient programs require?

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Register promotion in C programs

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
An efficient implementation of reactivity for modeling hardware in the scenic design environment

DAC '97 Proceedings of the 34th annual Design Automation Conference
A framework for balancing control flow and predication

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
A programming environment for the design of complex high speed ASICs

DAC '98 Proceedings of the 35th annual Design Automation Conference
Register promotion by sparse partial redundancy elimination of loads and stores

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
SSA is functional programming

ACM SIGPLAN Notices
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Hardware synthesis from C/C++ models

DATE '99 Proceedings of the conference on Design, automation and test in Europe
C for system level design

DATE '99 Proceedings of the conference on Design, automation and test in Europe
Hardware synthesis from C/C++

DATE '99 Proceedings of the conference on Design, automation and test in Europe
ECL: a specification environment for system-level design

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
The design of a low energy FPGA

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Hardware-software co-design of embedded reconfigurable architectures

Proceedings of the 37th Annual Design Automation Conference
The role of custom design in ASIC Chips

Proceedings of the 37th Annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
An open graph visualization system and its applications to software engineering

Software—Practice & Experience - Special issue on discrete algorithm engineering
Attacking the semantic gap between application programming languages and configurable hardware

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
An automated process for compiling dataflow graphs into reconfigurable hardware

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Speculation techniques for high level synthesis of control intensive designs

Proceedings of the 38th annual Design Automation Conference
NanoFabrics: spatial computing using molecular electronics

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Synthesis of hardware models in C with pointers and complex data structures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
Resynthesis and peephole transformations for the optimization of large-scale asynchronous systems

Proceedings of the 39th annual Design Automation Conference
Coordinated transformations for high-level synthesis of high performance microprocessor blocks

Proceedings of the 39th annual Design Automation Conference
Slack: maximizing performance under technological constraints

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Journal of VLSI Signal Processing Systems
Synthesis of operation-centric hardware descriptions

Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Path Analysis and Renaming for Predicated Instruction Scheduling

International Journal of Parallel Programming
Compilers for Instruction-Level Parallelism

Computer
Hardware Compilation: Translating Programs into Circuits

Computer
First version of a data flow procedure language

Programming Symposium, Proceedings Colloque sur la Programmation
Extended SSA Numbering: Introducing SSA Properties to Language with Multi-level Pointers

CC '98 Proceedings of the 7th International Conference on Compiler Construction
Instruction-Level Parallelism for Reconfigurable Computing

FPL '98 Proceedings of the 8th International Workshop on Field-Programmable Logic and Applications, From FPGAs to Computing Paradigm
XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Compiling Application-Specific Hardware

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Automatic Synthesis of Parallel Programs Targeted to Dynamically Reconfigurable Logic Arrays

FPL '95 Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications
Very Large Scale Spatial Computing

UMC '02 Proceedings of the Third International Conference on Unconventional Models of Computation
Effective Representation of Aliases and Indirect Memory Operations in SSA Form

CC '96 Proceedings of the 6th International Conference on Compiler Construction
An efficient static analysis algorithm to detect redundant memory operations

Proceedings of the 2002 workshop on Memory system performance
Implications of technology scaling on leakage reduction techniques

Proceedings of the 40th annual Design Automation Conference
Optimizing memory accesses for spatial computation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The Lutonium: A Sub-Nanojoule Asynchronous 8051 Microcontroller

ASYNC '03 Proceedings of the 9th International Symposium on Asynchronous Circuits and Systems
Mapping applications to the RaPiD configurable architecture

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Implementing C Algorithms in Reconfigurable Hardware Using C2Verilog

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Parallelizing Applications into Silicon

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A C to HDL Compiler for Pipeline Processing on FPGAs

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Stream-Oriented FPGA Computing in the Streams-C High Level Language

FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
Peer-to-Peer Hardware-Software Interfaces for Reconfigurable Fabrics

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A critique of multiprocessing von Neumann style

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Predicated Static Single Assignment

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Implementation and Evaluation of the Compiler for WASMII, a Virtual Hardware System

ICPP '99 Proceedings of the 1999 International Workshops on Parallel Processing
A dynamic instruction set computer

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
OCCAM

ACM SIGPLAN Notices
Spatial computation

Spatial computation
Bridging the gap between compilation and synthesis in the DEFACTO system

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
C-based SoC design flow and EDA tools: an ASIC and system vendor perspective

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Low-power circuits using dynamic threshold devices

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Dynamic loop pipelining in data-driven architectures

Proceedings of the 2nd conference on Computing frontiers
Bio Molecular Engine: a bio-inspired environment for models of growing and evolvable computation

GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
The impact of the nanoscale on computing systems

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Reducing control overhead in dataflow architectures

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Tartan: evaluating spatial computation for whole program execution

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
The WaveScalar architecture

ACM Transactions on Computer Systems (TOCS)
Global critical path: a tool for system-level timing analysis

Proceedings of the 44th annual Design Automation Conference
Self-resetting latches for asynchronous micro-pipelines

Proceedings of the 44th annual Design Automation Conference
Operation chaining asynchronous pipelined circuits

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Slack analysis in the system design loop

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
Impact of high-level transformations within the ROCCC framework

ACM Transactions on Architecture and Code Optimization (TACO)
FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Behavioral synthesis of asynchronous circuits using syntax directed translation as backend

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Applying frame layout to hardware design in FPGA for seamless support of cross calls in CPU-FPGA coupling architecture

Microprocessors & Microsystems
Bundled execution of recurring traces for energy-efficient general purpose processing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring many-core design templates for FPGAs and ASICs

International Journal of Reconfigurable Computing - Special issue on Selected Papers from the International Conference on Reconfigurable Computing and FPGAs (ReConFig'10)
Exposing ILP in custom hardware with a dataflow compiler IR

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units.In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient.In this work we demonstrate three features of ASH: (1) that such architectures can be built by automatic compilation of C programs; (2) that distributed computation is in some respects fundamentally different from monolithic superscalar processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x worst-case).