The Manchester prototype dataflow computer
Communications of the ACM - Special section on computer architecture
The misconstrued semicolon: reconciling imperative languages and dataflow machines
The misconstrued semicolon: reconciling imperative languages and dataflow machines
Evaluation of a prototype data flow processor of the SIGMA-1 for scientific computations
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Epsilon dataflow processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A preliminary architecture for a basic data-flow processor
ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
Exploring Optimal Cost-Performance Designs for Raw Microprocessors
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
DDDP-a Distributed Data Driven Processor
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
The architecture and system method of DDM1: A recursively structured Data Driven Machine
ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Performance/Watt: the new server focus
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
ACM Transactions on Computer Systems (TOCS)
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Reducing control overhead in dataflow architectures
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
ACM Transactions on Computer Systems (TOCS)
Implementation and Evaluation of a Dynamically Routed Processor Operand Network
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Chip multiprocessor based on data-driven multithreading model
International Journal of High Performance Systems Architecture
A case for FAME: FPGA architecture model execution
Proceedings of the 37th annual international symposium on Computer architecture
WSEAS Transactions on Computers
Lighting the dark silicon by exploiting heterogeneity on future processors
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
Tiled architectures, such as RAW, SmartMemories, TRIPS, and WaveScalar, promise to address several issues facing conventional processors, including complexity, wire-delay, and performance. The basic premise of these architectures is that larger, higher-performance implementations can be constructed by replicating the basic tile across the chip. This paper explores the area-performance trade-offs when designing one such tiled architecture, WaveScalar. We use a synthesizable RTL model and cycle-level simulator to perform an area/performance pareto analysis of over 200 WaveScalar processor designs ranging in size from 19mm2 to 378mm2 and having a 22 FO4 cycle time. We demonstrate that, for multi-threaded workloads, WaveScalar performance scales almost ideally from 19 to 101mm2 when optimized for area efficiency and from 44 to 202mm2when optimized for peak performance. Our analysis reveals that WaveScalar's hierarchical interconnect plays an important role in overall scalability, and that WaveScalar achieves the same (or higher) performance in substantially less area than either an aggressive out-of-order superscalar or Sun's Niagara CMP processor.