Simultaneous subordinate microthreading (SSMT)

Authors:
Robert S. Chappell;Jared Stark;Sangwook P. Kim;Steven K. Reinhardt;Yale N. Patt
Affiliations:
EECS Department (ACAL), The University of Michigan, Ann Arbor, Michigan;EECS Department (ACAL), The University of Michigan, Ann Arbor, Michigan;Intel Corporation, Santa Clara, CA;EECS Department (ACAL), The University of Michigan, Ann Arbor, Michigan;EECS Department (ACAL), The University of Michigan, Ann Arbor, Michigan
Venue:
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Year:
1999

Citing 18
Cited 77

Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Multithreading: a revisionist view of dataflow architectures

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compiler techniques for data prefetching on the PowerPC

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Disjoint eager execution: an optimal form of speculative execution

Proceedings of the 28th annual international symposium on Microarchitecture
Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Compiler synthesized dynamic branch prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Profetching and memory system behavior of the SPEC95 benchmark suite

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
An analysis of correlation and predictability: what makes two-level branch predictors work

Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Variable length path branch prediction

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Architectural Support for Compiler-Synthesized Dynamic Branch Prediction Strategies: Rationale and Initial Results

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture

Improving virtual function call target prediction via dependence-based pre-computation

ICS '99 Proceedings of the 13th international conference on Supercomputing
The use of multithreading for exception handling

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
An analysis of operating system behavior on a simultaneous multithreaded architecture

ACM SIGPLAN Notices
Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Relational profiling: enabling thread-level parallelism in virtual machines

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slice-processors: an implementation of operation-based prediction

ICS '01 Proceedings of the 15th international conference on Supercomputing
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamically allocating processor resources between nearby and distant ILP

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Scheduling coarse-grain operations for VLIW processors

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Difficult-path branch prediction using subordinate microthreads

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Design and evaluation of compiler algorithms for pre-execution

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Speculative Multithreaded Processors

Computer
Letters

Computer
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Amir Roth: Speculative Multithreaded Processors

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Microprocessors - 10 Years Back, 10 Years Ahead

Informatics - 10 Years Back. 10 Years Ahead.
Microarchitectural support for precomputation microthreads

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A quantitative framework for automated pre-execution thread selection

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A framework for modeling and optimization of prescient instruction prefetch

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Recycling waste: exploiting wrong-path execution to improve branch prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Mini-Threads: Increasing TLP on Small-Scale SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Correlation Prefetching with a User-Level Memory Thread

IEEE Transactions on Parallel and Distributed Systems
Beating in-order stalls with "flea-flicker" two-pass pipelining

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The need for adaptive dynamic thread scheduling

High performance scientific and engineering computing
A first glance at Kilo-instruction based multiprocessors

Proceedings of the 1st conference on Computing frontiers
Billion-Transistor Architectures: There and Back Again

Computer
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Tolerating Late Memory Traps in Dynamically Scheduled Processors

IEEE Transactions on Computers
A study of source-level compiler algorithms for automatic construction of pre-execution code

ACM Transactions on Computer Systems (TOCS)
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Scaling Up the Atlas Chip-Multiprocessor

IEEE Transactions on Computers
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Helper Threads via Virtual Multithreading

IEEE Micro
Toward kilo-instruction processors

ACM Transactions on Architecture and Code Optimization (TACO)
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection

Proceedings of the 32nd annual international symposium on Computer Architecture
Techniques for Efficient Processing in Runahead Execution Engines

Proceedings of the 32nd annual international symposium on Computer Architecture
Improving database performance on simultaneous multithreading processors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Exploiting Coarse-Grain Verification Parallelism for Power-Efficient Fault Tolerance

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative execution for hiding memory latency

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining

IEEE Transactions on Computers
Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance

IEEE Micro
Kilo-instruction processors, runahead and prefetching

Proceedings of the 3rd conference on Computing frontiers
Speculative pre-execution assisted by compiler (SPEAR)

Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Design and evaluation of a hierarchical decoupled architecture

The Journal of Supercomputing
Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Accurate branch prediction for short threads

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Performance scalability of decoupled software pipelining

ACM Transactions on Architecture and Code Optimization (TACO)
An Operating System Architecture for Organic Computing in Embedded Real-Time Systems

ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution

Microprocessors & Microsystems
A performance-correctness explicitly-decoupled architecture

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Combining thread level speculation helper threads and runahead execution

Proceedings of the 23rd international conference on Supercomputing
Reducing register file size through instruction pre-execution enhanced by value prediction

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Software data spreading: leveraging distributed caches to improve single thread performance

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Dynamic processors demand dynamic operating systems

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Helper thread prefetching for loosely-coupled multiprocessor systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Inter-core prefetching for multicore processors using migrating helper threads

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
The HELIX project: overview and directions

Proceedings of the 49th Annual Design Automation Conference
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Mixed speculative multithreaded execution models

ACM Transactions on Architecture and Code Optimization (TACO)
Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Control-Flow Decoupling

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Current work in Simultaneous Multithreading provides little benefit to programs that aren't partitioned into threads. We propose Simultaneous Subordinate Microthreading (SSMT) to correct this by spawning subordinate threads that perform optimizations on behalf of the single primary thread. These threads, written in microcode, are issued and executed concurrently with the primary thread. They directly manipulate the microarchitecture to improve the primary thread's branch prediction accuracy, cache hit rate, and prefetch effectiveness. All contribute to the performance of the primary thread. This paper introduces SSMT and discusses its potential to increase performance. We illustrate its usefulness with an SSMT machine that executes subordinate microthreads to improve the branch prediction of the primary thread. We show simulation results for the SPECint95 benchmarks.