An evaluation of the TRIPS computer system

Authors:
Mark Gebhart;Bertrand A. Maher;Katherine E. Coons;Jeff Diamond;Paul Gratz;Mario Marino;Nitya Ranganathan;Behnam Robatmili;Aaron Smith;James Burrill;Stephen W. Keckler;Doug Burger;Kathryn S. McKinley
Affiliations:
The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA
Venue:
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Year:
2009

Citing 19
Cited 9

Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
Speculative Memory Cloaking and Bypassing

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Scaling to the End of Silicon with EDGE Architectures

Computer
Piecewise Linear Branch Prediction

Proceedings of the 32nd annual international symposium on Computer Architecture
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
The M5 Simulator: Modeling Networked Systems

IEEE Micro
A spatial path scheduling algorithm for EDGE architectures

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Merging Head and Tail Duplication for Convergent Hyperblock Formation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Implementation and Evaluation of a Dynamically Routed Processor Operand Network

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Composable Lightweight Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
High performance dense linear algebra on a spatially distributed processor

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Anatomy of high-performance matrix multiplication

ACM Transactions on Mathematical Software (TOMS)
Amdahl's Law in the Multicore Era

Computer
Strategies for mapping dataflow blocks to distributed hardware

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

Using a configurable processor generator for computer architecture prototyping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
What Hill-Marty model learn from and break through Amdahl's law?

Information Processing Letters
CHARM: a composable heterogeneous accelerator-rich microprocessor

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hardware support for fine-grained event-driven computation in Anton 2

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A general constraint-centric scheduling framework for spatial architectures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Rapid, low-power loop execution in a network of functional units

Proceedings of the 17th Panhellenic Conference on Informatics
Q100: the architecture and design of a database processing unit

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine concurrency for high performance while tolerating emerging technology scaling challenges, such as increasing wire delays and power consumption. This paper evaluates how well TRIPS meets this goal through a detailed ISA and performance analysis. We compare performance, using cycles counts, to commercial processors. On SPEC CPU2000, the Intel Core 2 outperforms compiled TRIPS code in most cases, although TRIPS matches a Pentium 4. On simple benchmarks, compiled TRIPS code outperforms the Core 2 by 10% and hand-optimized TRIPS code outperforms it by factor of 3. Compared to conventional ISAs, the block-atomic model provides a larger instruction window, increases concurrency at a cost of more instructions executed, and replaces register and memory accesses with more efficient direct instruction-to-instruction communication. Our analysis suggests ISA, microarchitecture, and compiler enhancements for addressing weaknesses in TRIPS and indicates that EDGE architectures have the potential to exploit greater concurrency in future technologies.