The Manchester prototype dataflow computer
Communications of the ACM - Special section on computer architecture
The misconstrued semicolon: reconciling imperative languages and dataflow machines
The misconstrued semicolon: reconciling imperative languages and dataflow machines
Evaluation of a prototype data flow processor of the SIGMA-1 for scientific computations
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications
IEEE Transactions on Computers
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Epsilon dataflow processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An architecture of a dataflow single chip processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
The VAL Language: Description and Analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Lucid, a nonprocedural language with iteration
Communications of the ACM
NanoFabrics: spatial computing using molecular electronics
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Introduction: Special Issue on Microprocessor Verifications
Formal Methods in System Design
A preliminary architecture for a basic data-flow processor
ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
Two Fundamental Limits on Dataflow Multiprocessing
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
DDDP-a Distributed Data Driven Processor
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Using an oracle to measure potential parallelism in single instruction stream programs
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
The architecture and system method of DDM1: A recursively structured Data Driven Machine
ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
An evaluation of speculative instruction execution on simultaneous multithreaded processors
ACM Transactions on Computer Systems (TOCS)
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
IEEE Transactions on Parallel and Distributed Systems
Dynamic loop pipelining in data-driven architectures
Proceedings of the 2nd conference on Computing frontiers
A High Throughput String Matching Architecture for Intrusion Detection and Prevention
Proceedings of the 32nd annual international symposium on Computer Architecture
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks
Proceedings of the 32nd annual international symposium on Computer Architecture
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
Designing real-time H.264 decoders with dataflow architectures
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Placement for configurable dataflow architecture
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Constructing Virtual Architectures on a Tiled Processor
Proceedings of the International Symposium on Code Generation and Optimization
Compiling for EDGE Architectures
Proceedings of the International Symposium on Code Generation and Optimization
Bit-split string-matching engines for intrusion detection and prevention
ACM Transactions on Architecture and Code Optimization (TACO)
Area-Performance Trade-offs in Tiled Dataflow Architectures
Proceedings of the 33rd annual international symposium on Computer Architecture
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Reducing control overhead in dataflow architectures
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Data-Driven Multithreading Using Conventional Microprocessors
IEEE Transactions on Parallel and Distributed Systems
A spatial path scheduling algorithm for EDGE architectures
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Tartan: evaluating spatial computation for whole program execution
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 4th international conference on Computing frontiers
High performance dense linear algebra on a spatially distributed processor
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Communications of the ACM - Web science
Counting Dependence Predictors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Transparent reconfigurable acceleration for heterogeneous embedded applications
Proceedings of the conference on Design, automation and test in Europe
A Non-blocking Multithreaded Architecture with Support for Speculative Threads
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
HeDGE: Hybrid Dataflow Graph Execution in the Issue Logic
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Efficient unicast and multicast support for CMPs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Strategies for mapping dataflow blocks to distributed hardware
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Design and optimization of the store vectors memory dependence predictor
ACM Transactions on Architecture and Code Optimization (TACO)
rMPI: message passing on multicore processors with on-chip interconnect
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Chip multiprocessor based on data-driven multithreading model
International Journal of High Performance Systems Architecture
WSEAS Transactions on Computers
WSEAS Transactions on Computers
Design and implementation of the PLUG architecture for programmable and efficient network lookups
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
A scheduling approach for distributed resource architectures with scarce communication resources
International Journal of High Performance Systems Architecture
PRADA: a high-performance reconfigurable parallel architecture based on the dataflow model
International Journal of High Performance Systems Architecture
Task Superscalar: An Out-of-Order Task Pipeline
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The C compiler generating a source file in VHDL for a dynamic dataflow machine
MMACTEE'09 Proceedings of the 11th WSEAS international conference on Mathematical methods and computational techniques in electrical engineering
A pattern for efficient parallel computation on multicore processors with scalar operand networks
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Trebuchet: exploring TLP with dataflow virtualisation
International Journal of High Performance Systems Architecture
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Hardware support for OpenMP collective operations
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed replay protocol for distributed uniprocessors
Proceedings of the 26th ACM international conference on Supercomputing
Mixing static and dynamic strategies for high performance and low area reconfigurable systems
International Journal of High Performance Systems Architecture
Viper: virtual pipelines for enhanced reliability
Proceedings of the 39th Annual International Symposium on Computer Architecture
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs
ACM Transactions on Architecture and Code Optimization (TACO)
A general constraint-centric scheduling framework for spatial architectures
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
The von Neumann architecture is due for retirement
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
The sharing architecture: sub-core configurability for IaaS clouds
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Silicon technology will continue to provide an exponential increasein the availability of raw transistors. Effectively translatingthis resource into application performance, however,is an open challenge. Ever increasing wire-delay relativeto switching speed and the exponential cost of circuit complexitymake simply scaling up existing processor designs futile.In this paper, we present an alternative to superscalardesign, WaveScalar. WaveScalar is a dataflow instructionset architecture and execution model designed for scalable,low-complexity/high-performance processors. WaveScalar isunique among dataflow architectures in efficiently providingtraditional memory semantics. At last, a dataflow machinecan run "real-world" programs, written in any language,without sacrificing parallelism.The WaveScalar ISA is designed to run on an intelligentmemory system. Each instruction in a WaveScalar binary executesin place in the memory system and explicitly communicateswith its dependents in dataflow fashion. WaveScalararchitectures cache instructions and the values they operateon in a WaveCache, a simple grid of "alu-in-cache" nodes.By co-locating computation and data in physical space, theWaveCache minimizes long wire, high-latency communication.This paper introduces the WaveScalar instruction setand evaluates a simulated implementation based on currenttechnology. Results for the SPEC and Mediabench applicationsdemonstrate that the WaveCache out-performs an aggressivelyconfigured superscalar design by 2-7 times, withample opportunities for future optimizations.