Scaling to the End of Silicon with EDGE Architectures

Authors:
Doug Burger;Stephen W. Keckler;Kathryn S. McKinley;Mike Dahlin;Lizy K. John;Calvin Lin;Charles R. Moore;James Burrill;Robert G. McDonald;William Yoder;the TRIPS Team
Affiliations:
The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin
Venue:
Computer
Year:
2004

Citing 13
Cited 67

Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Baring It All to Software: Raw Machines

Computer
Data flow languages and architectures

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Universal Mechanisms for Data-Parallel Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture

Scalable selective re-execution for EDGE architectures

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Scalable Hardware Memory Disambiguation for High-ILP Processors

IEEE Micro
Low-power, low-complexity instruction issue using compiler assistance

Proceedings of the 19th annual international conference on Supercomputing
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Data-Driven Multithreading Using Conventional Microprocessors

IEEE Transactions on Parallel and Distributed Systems
A case for chip multiprocessors based on the data-driven multithreading model

International Journal of Parallel Programming
A spatial path scheduling algorithm for EDGE architectures

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A Simple Data Transfer Technique Using Local Address for Networks-on-Chips

IEEE Transactions on Parallel and Distributed Systems
Merging Head and Tail Duplication for Convergent Hyperblock Formation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Rotary router: an efficient architecture for CMP interconnection networks

Proceedings of the 34th annual international symposium on Computer architecture
A 64-bit stream processor architecture for scientific applications

Proceedings of the 34th annual international symposium on Computer architecture
Late-binding: enabling unordered load-store queues

Proceedings of the 34th annual international symposium on Computer architecture
Implementation and Evaluation of a Dynamically Routed Processor Operand Network

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
On-Chip Interconnection Networks of the TRIPS Chip

IEEE Micro
Challenges and Promising Results in NoC Prototyping Using FPGAs

IEEE Micro
Alternative dataflow model

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
High performance dense linear algebra on a spatially distributed processor

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Counting Dependence Predictors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Metaoptimization of the in-lining priority function for a compiler targeting a polymorphous computing architecture

Proceedings of the 10th annual conference companion on Genetic and evolutionary computation
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Reducing the Interconnection Network Cost of Chip Multiprocessors

NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Multitasking workload scheduling on flexible core chip multiprocessors

ACM SIGARCH Computer Architecture News
A Non-blocking Multithreaded Architecture with Support for Speculative Threads

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Impact of Software Bypassing on Instruction Level Parallelism and Register File Traffic

SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Multitasking workload scheduling on flexible-core chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor

The Journal of Supercomputing
Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Convergent Compilation Applied to Loop Unrolling

Transactions on High-Performance Embedded Architectures and Compilers I
An evaluation of the TRIPS computer system

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Design and implementation of a queue compiler

Microprocessors & Microsystems
Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
End-to-end validation of architectural power models

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Design and Tool Flow of Multimedia MPSoC Platforms

Journal of Signal Processing Systems
High Performance Matrix Multiplication on Many Cores

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Mt-ADRES: multithreading on coarse-grained reconfigurable architecture

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
On-chip COMA cache-coherence protocol for microgrids of microthreaded cores

Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Chip multiprocessor based on data-driven multithreading model

International Journal of High Performance Systems Architecture
A programmable parallel accelerator for learning and classification

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Task superscalar: using processors as functional units

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Gate sizing for large cell-based designs

Proceedings of the Conference on Design, Automation and Test in Europe
Federation: Boosting per-thread performance of throughput-oriented manycore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
An analytical network performance model for SIMD processor CSX600 interconnects

Journal of Systems Architecture: the EUROMICRO Journal
Dynamic vectorization in the E2 dynamic multicore architecture

ACM SIGARCH Computer Architecture News
A scheduling approach for distributed resource architectures with scarce communication resources

International Journal of High Performance Systems Architecture
Trebuchet: exploring TLP with dataflow virtualisation

International Journal of High Performance Systems Architecture
Interconnect exploration for energy versus performance tradeoffs for coarse grained reconfigurable architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
RCMP: a reconfigurable chip-multiprocessor architecture

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
CoreSymphony: an efficient reconfigurable multi-core architecture

ACM SIGARCH Computer Architecture News
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic data locality optimization through self-optimization

IWSOS'06/EuroNGI'06 Proceedings of the First international conference, and Proceedings of the Third international conference on New Trends in Network Architectures and Services conference on Self-Organising Systems
Dataflow-driven execution control in a coarse-grained reconfigurable array (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Data-driven regular reconfigurable arrays: design space exploration and mapping

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Enforcing dimension-order routing in on-chip torus networks without virtual channels

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Distributed replay protocol for distributed uniprocessors

Proceedings of the 26th ACM international conference on Supercomputing
Viper: virtual pipelines for enhanced reliability

Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving communication latency with the write-only architecture

Journal of Parallel and Distributed Computing
A general constraint-centric scheduling framework for spatial architectures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Triggered instructions: a control paradigm for spatially-programmed architectures

Proceedings of the 40th Annual International Symposium on Computer Architecture
The von Neumann architecture is due for retirement

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Rapid, low-power loop execution in a network of functional units

Proceedings of the 17th Panhellenic Conference on Informatics
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions

Proceedings of the Eleventh ACM International Conference on Embedded Software
A hyperscalar dual-core architecture for embedded systems

Microprocessors & Microsystems

Quantified Score

Hi-index	4.10

Visualization

Abstract

Post-RISC microprocessor designs must introduce new ISAs to address the challenges that modern CMOS technologies pose while also exploiting the massive levels of integration now possible. To meet these challenges, the TRIPS Team at the University of Texas at Austin has developed a new class of ISAs, called Explicit Data Graph Execution, that will match the characteristics of semiconductor technology over the next decade.EDGE architectures appear to offer a progressively better solution as technology scales down to the end of silicon, with each generation providing a richer spatial substrate at the expense of increased global communication delays.