TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP

Authors:
Karthikeyan Sankaralingam;Ramadass Nagarajan;Haiming Liu;Changkyu Kim;Jaehyuk Huh;Nitya Ranganathan;Doug Burger;Stephen W. Keckler;Robert G. McDonald;Charles R. Moore
Affiliations:
The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2004

Citing 37
Cited 10

HPSm, a high performance restricted data flow architecture having minimal functionality

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Executing a Program on the MIT Tagged-Token Dataflow Architecture

IEEE Transactions on Computers
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing the instruction fetch rate via block-structured instruction set architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Dynamic IPC/clock rate optimization

Proceedings of the 25th annual international symposium on Computer architecture
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance benefits of large execution atomic units in dynamically scheduled machines

ICS '89 Proceedings of the 3rd international conference on Supercomputing
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Baring It All to Software: Raw Machines

Computer
PipeRench: A Reconfigurable Architecture and Compiler

Computer
The Alpha 21264 Microprocessor

IEEE Micro
Imagine: Media Processing with Streams

IEEE Micro
The Gilgamesh MIND Processor-in-Memory Architecture for Petaflops-Scale Computing

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Dynamic frequency and voltage control for a multiple clock domain microarchitecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Configurable computing: the catalyst for high-performance architectures

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Control Flow Speculation in Multiscalar Processors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
FlexRAM: Toward an Advanced Intelligent Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Evaluation of a Multithreaded Architecture for Cellular Computing

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

IEEE Transactions on Computers
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Universal Mechanisms for Data-Parallel Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scalable Hardware Memory Disambiguation for High ILP Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture

Tile size selection for low-power tile-based architectures

Proceedings of the 3rd conference on Computing frontiers
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Alternative dataflow model

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Comparative evaluation of memory models for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Using Application Bisection Bandwidth to Guide Tile Size Selection for the Synchroscalar Tile-Based Architecture

Transactions on High-Performance Embedded Architectures and Compilers I
FlexCore: Utilizing Exposed Datapath Control for Efficient Computing

Journal of Signal Processing Systems
Stream image processing on a dual-core embedded system

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
RETHROTTLE: execution throttling in the REDEFINE SoC architecture

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Enhancing L2 organization for CMPs with a center cell

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic resource tuning for flexible core chip multiprocessors

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the polymorphous TRIPS architecture that can be configured for different granularities and types of parallelism. The TRIPS architecture is the first in a class of post-RISC, dataflow-like instruction sets called explicit data-graph execution (EDGE). This EDGE ISA is coupled with hardware mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and large-grain concurrency, the TRIPS architecture prototype contains two out-of-order, 16-wide-issue grid processor cores, which can be partitioned when easily extractable fine-grained parallelism exists. This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism. Our results show that high performance can be obtained in each of the three modes---ILP, TLP, and DLP---demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.