Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

Authors:
Dean M. Tullsen;Susan J. Eggers;Joel S. Emer;Henry M. Levy;Jack L. Lo;Rebecca L. Stamm
Affiliations:
Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA;Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA;Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA;Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA;Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA;Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA
Venue:
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Year:
1996

Citing 20
Cited 231

Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
New CPU benchmark suites from SPEC

COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Microarchitecture support for dynamic scheduling of acyclic task graphs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Interleaving: a multithreading technique targeting multiprocessors and workstations

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The effects of STEF in finely parallel multithreaded processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and performance evaluation of a multithreaded architecture

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture

Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Initial results on the performance and cost of vector microprocessors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Multipath execution: opportunities and limits

ICS '98 Proceedings of the 12th international conference on Supercomputing
Speculative execution model with duplication

ICS '98 Proceedings of the 12th international conference on Supercomputing
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
On embedding a microarchitectural design language within Haskell

Proceedings of the fourth ACM SIGPLAN international conference on Functional programming
Instruction fetch mechanisms for multipath execution processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Hardware identification of cache conflict misses

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The use of multithreading for exception handling

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

International Journal of Parallel Programming
Design Alternatives of Multithreaded Architecture

International Journal of Parallel Programming
ILP versus TLP on SMT

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Symbiotic jobscheduling for a simultaneous mutlithreading processor

ACM SIGPLAN Notices
An analysis of operating system behavior on a simultaneous multithreaded architecture

ACM SIGPLAN Notices
Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Relational profiling: enabling thread-level parallelism in virtual machines

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The benefits and costs of DyC's run-time optimizations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Slice-processors: an implementation of operation-based prediction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Runtime identification of cache conflict misses: The adaptive miss buffer

ACM Transactions on Computer Systems (TOCS)
Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Efficient dynamic scheduling through tag elimination

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Transient-fault recovery using simultaneous multithreading

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design tradeoffs for the Alpha EV8 conditional branch predictor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Design and evaluation of compiler algorithms for pre-execution

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Enhancing software reliability with speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Speculative Multithreaded Processors

Computer
Letters

Computer
Dependable Computing and Online Testing in Adaptive and Configurable Systems

IEEE Design & Test
Exploiting Instruction- and Data-Level Parallelism

IEEE Micro
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Amir Roth: Speculative Multithreaded Processors

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Boosting SMT Performance by Speculation Control

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
The Case for Speculative Multithreading on SMT Processors

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Formal Verification of Explicitly Parallel Microprocessors

CHARME '99 Proceedings of the 10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Microprocessors - 10 Years Back, 10 Years Ahead

Informatics - 10 Years Back. 10 Years Ahead.
Automatic pool allocation for disjoint data structures

Proceedings of the 2002 workshop on Memory system performance
Pointer cache assisted prefetching

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitectural denial of service: insuring microarchitectural fairness

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiling for instruction cache performance on a multithreaded architecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Hyperthreading Technology in the Netburst Microarchitecture

IEEE Micro
A framework for modeling and optimization of prescient instruction prefetch

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving server software support for simultaneous multithreaded processors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Mini-Threads: Increasing TLP on Small-Scale SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Data Dependence Tracking and its Application to Branch Prediction

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Power-Sensitive Multithreaded Architecture

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Simultaneous Multithreading-Based Routers

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Implicitly-multithreaded processors

Proceedings of the 30th annual international symposium on Computer architecture
Transient-fault recovery for chip multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
An evaluation of speculative instruction execution on simultaneous multithreaded processors

ACM Transactions on Computer Systems (TOCS)
A Clustered Approach to Multithreaded Processors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An Experimental Study of Polylogarithmic, Fully Dynamic, Connectivity Algorithms

Journal of Experimental Algorithmics (JEA)
The need for adaptive dynamic thread scheduling

High performance scientific and engineering computing
Predictable performance in SMT processors

Proceedings of the 1st conference on Computing frontiers
A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching

ACM Transactions on Computer Systems (TOCS)
The energy efficiency of CMP vs. SMT for multimedia workloads

Proceedings of the 18th annual international conference on Supercomputing
Wire Delay is Not a Problem for SMT (In the Near Future)

Proceedings of the 31st annual international symposium on Computer architecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
SMTp: An Architecture for Next-generation Scalable Multi-threading

Proceedings of the 31st annual international symposium on Computer architecture
Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs

Proceedings of the 31st annual international symposium on Computer architecture
A study of source-level compiler algorithms for automatic construction of pre-execution code

ACM Transactions on Computer Systems (TOCS)
QoS for High-Performance SMT Processors in Embedded Systems

IEEE Micro
Late Allocation and Early Release of Physical Registers

IEEE Transactions on Computers
Safely exploiting multithreaded processors to tolerate memory latency in real-time systems

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
A scalable, clustered SMT processor for digital signal processing

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Architectural Support for Enhanced SMT Job Scheduling

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Area and System Clock Effects on SMT/CMP Throughput

IEEE Transactions on Computers
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scalable cache memory design for large-scale SMT architectures

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Architecture optimization for multimedia application exploiting data and thread-level parallelism

Journal of Systems Architecture: the EUROMICRO Journal
Evaluating the impact of simultaneous multithreading on network servers using real hardware

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Opportunistic Transient-Fault Detection

Proceedings of the 32nd annual international symposium on Computer Architecture
Understanding the energy efficiency of SMT and CMP with multiclustering

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Architectural support for real-time task scheduling in SMT processors

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Tornado warning: the perils of selective replay in multithreaded processors

Proceedings of the 19th annual international conference on Supercomputing
How to Fake 1000 Registers

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multithreaded architectures and the sort benchmark

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework

Proceedings of the International Symposium on Code Generation and Optimization
Database hash-join algorithms on multithreaded computer architectures

Proceedings of the 3rd conference on Computing frontiers
Intelligent memory manager: reducing cache pollution due to memory management functions

Journal of Systems Architecture: the EUROMICRO Journal
Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Proceedings of the 33rd annual international symposium on Computer Architecture
A Low-Power Multithreaded Processor for Software Defined Radio

Journal of VLSI Signal Processing Systems
Spin Detection Hardware for Improved Management of Multithreaded Systems

IEEE Transactions on Parallel and Distributed Systems
Automatic logging of operating system effects to guide application-level architecture simulation

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Characterization of simultaneous multithreading (SMT) efficiency in POWER5

IBM Journal of Research and Development - POWER5 and packaging
Adaptive reorder buffers for SMT processors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Predictable Performance in SMT Processors: Synergy between the OS and SMTs

IEEE Transactions on Computers
Throttling-Based Resource Management in High Performance Multithreaded Architectures

IEEE Transactions on Computers
A case study of multi-threading in the embedded space

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
Fairness and Throughput in Switch on Event Multithreading

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread

Journal of Parallel and Distributed Computing
Register port complexity reduction in wide-issue processors with selective instruction execution

Microprocessors & Microsystems
Using fine grain multithreading for energy efficient computing

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compacting register file via 2-level renaming and bit-partitioning

Microprocessors & Microsystems
Online diagnosis of hard faults in microprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Enhancements for hyper-threading technology in the operating system: seeking the optimal scheduling

WIESS'02 Proceedings of the 2nd conference on Industrial Experiences with Systems Software - Volume 2
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Fairness enforcement in switch on event multithreading

ACM Transactions on Architecture and Code Optimization (TACO)
Resource area dilation to reduce power density in throughput servers

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies

IEEE Transactions on Computers
Do commodity SMT processors need more OS research?

ACM SIGOPS Operating Systems Review
Exploring the performance limits of simultaneous multithreading for memory intensive applications

The Journal of Supercomputing
Future ILP processors

International Journal of High Performance Computing and Networking
A latency-conscious SMT branch prediction architecture

International Journal of High Performance Computing and Networking
Optimising long-latency-load-aware fetch policies for SMT processors

International Journal of High Performance Computing and Networking
Pipelined hash-join on multithreaded architectures

DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
A SMT-ARM simulator and performance evaluation

SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Addressing thermal nonuniformity in SMT workloads

ACM Transactions on Architecture and Code Optimization (TACO)
A dynamically reconfigurable cache for multithreaded processors

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
The shared-thread multiprocessor

Proceedings of the 22nd annual international conference on Supercomputing
Energy-Efficient Simultaneous Thread Fetch from Different Cache Levels in a Soft Real-Time SMT Processor

SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
An adaptive resource partitioning algorithm for SMT processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
DLL-conscious instruction fetch optimization for SMT processors

Journal of Systems Architecture: the EUROMICRO Journal
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Speculative return address stack management revisited

ACM Transactions on Architecture and Code Optimization (TACO)
Hill-climbing SMT processor resource distribution

ACM Transactions on Computer Systems (TOCS)
A multithreading embedded architecture

DNCOCO'08 Proceedings of the 7th conference on Data networks, communications, computers
An approach on distributed and shared dynamic cache partition

DNCOCO'08 Proceedings of the 7th conference on Data networks, communications, computers
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
IPC Control for Multiple Real-Time Threads on an In-Order SMT Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors

Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
Issue Mechanism for Embedded Simultaneous Multithreading Processor

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
A Garbage Collection Technique for Embedded Multithreaded Multicore Processors

ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
The impact of speculative execution on SMT processors

International Journal of Parallel Programming
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Improving SMT performance: an application of genetic algorithms to configure resizable caches

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
MPTLsim: a cycle-accurate, full-system simulator for x86-64 multicore architectures with coherent caches

ACM SIGARCH Computer Architecture News
Energy-efficient register caching with compiler assistance

ACM Transactions on Architecture and Code Optimization (TACO)
Evaluation of Different Multithreaded and Multicore Processor Configurations for SoPC

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
The Impact of Resource Sharing Control on the Design of Multicore Processors

ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Architecture Design for Soft Errors

Architecture Design for Soft Errors
MPTLsim: a simulator for X86 multicore processors

Proceedings of the 46th Annual Design Automation Conference
Fixed-priority scheduling on prioritized SMT processor

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Selective replication: A lightweight technique for soft errors

ACM Transactions on Computer Systems (TOCS)
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic capacity-speed tradeoffs in SMT processor caches

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Compiler techniques for reducing data cache miss rate on a multithreaded architecture

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A predictable simultaneous multithreading scheme for hard real-time

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Soft real-time scheduling on SMT processors with explicit resource allocation

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Forwardflow: a scalable core for power-constrained CMPs

Proceedings of the 37th annual international symposium on Computer architecture
Modeling critical sections in Amdahl's law and its implications for multicore design

Proceedings of the 37th annual international symposium on Computer architecture
Thread-management techniques to maximize efficiency in multicore and simultaneous multithreaded microprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

Journal of Parallel and Distributed Computing
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient runahead threads

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Lithe: enabling efficient composition of parallel libraries

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Improving SMT performance scheduling processes

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compatible phase co-scheduling on a CMP of multi-threaded processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Data layout for cache performance on a multithreaded architecture

Transactions on high-performance embedded architectures and compilers III
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
A study on factors influencing power consumption in multithreaded and multicore CPUs

WSEAS Transactions on Computers
Managing SMT resource usage through speculative instruction window weighting

ACM Transactions on Architecture and Code Optimization (TACO)
MyUT: Design and implementation of efficient user-level thread management for improving cache utilization

ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Capacity metric for chip heterogeneous multiprocessors

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A fault-tolerant, dynamically scheduled pipeline structure for chip multiprocessors

SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
A phase adaptive cache hierarchy for SMT processors

Microprocessors & Microsystems
Trade-offs in transient fault recovery schemes for redundant multithreaded processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Supporting speculative multithreading on simultaneous multithreaded processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Enhancing ICOUNT2.8 fetch policy with better fairness for SMT processors

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Do trace cache, value prediction and prefetching improve SMT throughput?

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Next generation embedded processor architecture for personal information devices

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
A fetch policy maximizing throughput and fairness for two-context SMT processors

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Static partitioning vs dynamic sharing of resources in simultaneous multithreading microarchitectures

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Enhancing DCache warn fetch policy for SMT processors

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
2L-MuRR: a compact register renaming scheme for SMT processors

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
An in-order SMT architecture with static resource partitioning for consumer applications

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Parallel job scheduling — a status report

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Low power microprocessor design for embedded systems

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
Improving accuracy of perceptron predictor through correlating data values in SMT processors

ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part III
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

ACM Transactions on Computer Systems (TOCS)
Probabilistic modeling for job symbiosis scheduling on SMT processors

ACM Transactions on Architecture and Code Optimization (TACO)
Mixed speculative multithreaded execution models

ACM Transactions on Architecture and Code Optimization (TACO)
APC: a performance metric of memory systems

ACM SIGMETRICS Performance Evaluation Review
A bypass mechanism to enhance branch predictor for SMT processors

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors

Computers and Electrical Engineering
FROCM: a fair and low-overhead method in SMT processor

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

ACM Transactions on Architecture and Code Optimization (TACO)
Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling

Proceedings of the ACM International Conference on Computing Frontiers
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processors

Computers and Electrical Engineering
The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution

ACM Transactions on Architecture and Code Optimization (TACO)
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GPUfs: Integrating a file system with GPUs

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.02

Visualization

Abstract

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.