Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Authors:
Karthikeyan Sankaralingam;Ramadass Nagarajan;Haiming Liu;Changkyu Kim;Jaehyuk Huh;Doug Burger;Stephen W. Keckler;Charles R. Moore
Affiliations:
The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin
Venue:
Proceedings of the 30th annual international symposium on Computer architecture
Year:
2003

Citing 21
Cited 130

IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Baring It All to Software: Raw Machines

Computer
PipeRench: A Reconfigurable Architecture and Compiler

Computer
Imagine: Media Processing with Streams

IEEE Micro
Configurable computing: the catalyst for high-performance architectures

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Control Flow Speculation in Multiscalar Processors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Evaluation of a Multithreaded Architecture for Cellular Computing

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

IEEE Transactions on Computers
POWER4 system microarchitecture

IBM Journal of Research and Development

LLVA: A Low-level Virtual Instruction Set Architecture

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Universal Mechanisms for Data-Parallel Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Power-driven Design of Router Microarchitectures in On-chip Networks

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
BLOB computing

Proceedings of the 1st conference on Computing frontiers
Billion-Transistor Architectures: There and Back Again

Computer
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Scaling to the End of Silicon with EDGE Architectures

Computer
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
High-level power analysis for on-chip networks

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Scalable selective re-execution for EDGE architectures

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Thermal Modeling, Characterization and Management of On-Chip Networks

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Vector-Thread Architecture

IEEE Micro
A Technology-Aware and Energy-Oriented Topology Exploration for On-Chip Networks

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
An Application Analysis Framework For Polymorphic Chip Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks

Proceedings of the 32nd annual international symposium on Computer Architecture
Improving energy efficiency by making DRAM less randomly accessed

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Software-directed power-aware interconnection networks

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST)

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Hardware-modulated parallelism in chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Compiler-directed channel allocation for saving power in on-chip networks

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Temperature-Aware On-Chip Networks

IEEE Micro
Placement for configurable dataflow architecture

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Constructing Virtual Architectures on a Tiled Processor

Proceedings of the International Symposium on Code Generation and Optimization
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Reducing NoC energy consumption through compiler-directed channel voltage scaling

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks

Proceedings of the 33rd annual international symposium on Computer Architecture
Area-Performance Trade-offs in Tiled Dataflow Architectures

Proceedings of the 33rd annual international symposium on Computer Architecture
Modeling instruction placement on a spatial architecture

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Tartan: evaluating spatial computation for whole program execution

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
High-level power analysis for multi-core chips

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Real-time rendering systems in 2010

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Architecting a reliable CMP switch architecture

ACM Transactions on Architecture and Code Optimization (TACO)
ALP: Efficient support for all levels of parallelism for complex media applications

ACM Transactions on Architecture and Code Optimization (TACO)
Software-directed power-aware interconnection networks

ACM Transactions on Architecture and Code Optimization (TACO)
The psi-cube: a bus-based cube-type clustering network for high-performance on-chip systems

Parallel Computing
The WaveScalar architecture

ACM Transactions on Computer Systems (TOCS)
Fuce: the continuation-based multithreading processor

Proceedings of the 4th international conference on Computing frontiers
Scalability of continuation-based fine-grained multithreading in handling multiple I/O requests on FUCE

Proceedings of the 4th international conference on Computing frontiers
Express virtual channels: towards the ideal interconnection fabric

Proceedings of the 34th annual international symposium on Computer architecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Exploring the Design Space of Self-Regulating Power-Aware On/Off Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors

Proceedings of the 21st annual international conference on Supercomputing
A low-cost mixed-mode parallel processor architecture for embedded systems

Proceedings of the 21st annual international conference on Supercomputing
Reconciling performance and programmability in networking systems

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
A 5-GHz Mesh Interconnect for a Teraflops Processor

IEEE Micro
Data locality enhancement for CMPs

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Compiling for vector-thread architectures

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Application-specific network-on-chip architecture synthesis based on set partitions and Steiner trees

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Trends toward on-chip networked microsystems

International Journal of High Performance Computing and Networking
Future generation supercomputers I: a paradigm for node architecture

ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
Fpga-based prototype of a pram-on-chip processor

Proceedings of the 5th conference on Computing frontiers
Software-directed combined cpu/link voltage scaling fornoc-based cmps

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Counting Dependence Predictors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations

International Journal of Parallel, Emergent and Distributed Systems
Transparent reconfigurable acceleration for heterogeneous embedded applications

Proceedings of the conference on Design, automation and test in Europe
Run-Time Adaptable Architectures for Heterogeneous Behavior Embedded Systems

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
A Non-blocking Multithreaded Architecture with Support for Speculative Threads

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Using GPUs to improve multigrid solver performance on a cluster

International Journal of Computational Science and Engineering
A continuation-based noninterruptible multithreading processor architecture

The Journal of Supercomputing
Diastolic arrays: throughput-driven reconfigurable computing

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Token flow control

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Adaptive data compression for high-performance low-power on-chip networks

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient unicast and multicast support for CMPs

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Strategies for mapping dataflow blocks to distributed hardware

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dynamic heterogeneity and the need for multicore virtualization

ACM SIGOPS Operating Systems Review
Dynamic parallelization of single-threaded binary programs using speculative slicing

Proceedings of the 23rd international conference on Supercomputing
Polaris: a system-level roadmapping toolchain for on-chip interconnection networks

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A memory system design framework: creating smart memories

Proceedings of the 36th annual international symposium on Computer architecture
PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Design and optimization of the store vectors memory dependence predictor

ACM Transactions on Architecture and Code Optimization (TACO)
REDEFINE: Runtime reconfigurable polymorphic ASIC

ACM Transactions on Embedded Computing Systems (TECS)
Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Custom networks-on-chip architectures with multicast routing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
FinFET-based power simulator for interconnection networks

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Design of the tile-based embedded multimedia processor: TEMP

NBiS'07 Proceedings of the 1st international conference on Network-based information systems
FT64: scientific computing with streams

HiPC'07 Proceedings of the 14th international conference on High performance computing
FinFET-based dynamic power management of on-chip interconnection networks through adaptive back-gate biasing

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Chip multiprocessor based on data-driven multithreading model

International Journal of High Performance Systems Architecture
CA-MPSoC: An automated design flow for predictable multi-processor architectures for multiple applications

Journal of Systems Architecture: the EUROMICRO Journal
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture
A dynamic dataflow architecture using partial reconfigurable hardware as an option for multiple cores

WSEAS Transactions on Computers
MEDICS: ultra-portable processing for medical image reconstruction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Design and implementation of the PLUG architecture for programmable and efficient network lookups

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An overview of achieving energy efficiency in on-chip networks

International Journal of Communication Networks and Distributed Systems
Reconfiguration of embedded java applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Self-reconfigurable channel data buffering scheme and circuit design for adaptive flow control in power-efficient network-on-chips

IEEE Transactions on Circuits and Systems Part I: Regular Papers
PRADA: a high-performance reconfigurable parallel architecture based on the dataflow model

International Journal of High Performance Systems Architecture
Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A Predictive Model for Dynamic Microarchitectural Adaptivity Control

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Real-Time Adaptive Background Modeling for Multicore Embedded Systems

Journal of Signal Processing Systems
Power-efficient tree-based multicast support for networks-on-chip

Proceedings of the 16th Asia and South Pacific Design Automation Conference
CRIB: consolidated rename, issue, and bypass

Proceedings of the 38th annual international symposium on Computer architecture
Bahurupi: A polymorphic heterogeneous multi-core architecture

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A framework for compiler driven design space exploration for embedded system customization

ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
A low-swing crossbar and link generator for low-power networks-on-chip

Proceedings of the International Conference on Computer-Aided Design
A SAT-based decision procedure for the subclass of unrollable list formulas in ACL2 (SULFA)

IJCAR'06 Proceedings of the Third international joint conference on Automated Reasoning
Chameleon: operating system support for dynamic processors

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hardware support for OpenMP collective operations

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Bundled execution of recurring traces for energy-efficient general purpose processing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Global register alias table: Boosting sequential program on multi-core

Future Generation Computer Systems
A stream architecture supporting multiple stream execution models

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Software–hardware cooperative power management for main memory

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
A parallelizing compiler cooperative heterogeneous multicore processor architecture

Transactions on High-Performance Embedded Architectures and Compilers IV
Tiled multi-core stream architecture

Transactions on High-Performance Embedded Architectures and Compilers IV
Single thread program parallelism with dataflow abstracting thread

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Mixing static and dynamic strategies for high performance and low area reconfigurable systems

International Journal of High Performance Systems Architecture
Configurable fine-grain protection for multicore processor virtualization

Proceedings of the 39th Annual International Symposium on Computer Architecture
Scalability-based manycore partitioning

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Catnap: energy proportional multiple network-on-chip

Proceedings of the 40th Annual International Symposium on Computer Architecture
A heterogeneous multiple network-on-chip design: an application-aware approach

Proceedings of the 50th Annual Design Automation Conference
Rapid, low-power loop execution in a network of functional units

Proceedings of the 17th Panhellenic Conference on Informatics
Dynamic microarchitectural adaptation using machine learning

ACM Transactions on Architecture and Code Optimization (TACO)
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
A hyperscalar dual-core architecture for embedded systems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes the polymorphous TRIPS architecture which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and large-grain concurrency, the TRIPS architecture contains four out-of-order, 16-wide-issue Grid Processor cores, which can be partitioned when easily extractable fine-grained parallelism exists. This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism. Our results show that high performance can be obtained in each of the three modes--ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.