Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Disjoint eager execution: an optimal form of speculative execution
Proceedings of the 28th annual international symposium on Microarchitecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Multipath execution: opportunities and limits
ICS '98 Proceedings of the 12th international conference on Supercomputing
Improving prediction for procedure returns with return-address-stack repair mechanisms
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Reducing branch misprediction penalties via dynamic control independence detection
ICS '99 Proceedings of the 13th international conference on Supercomputing
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
Instruction fetch mechanisms for multipath execution processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The use of multithreading for exception handling
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Computers
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
α-coral: a multigrain, multithreaded processor architecture
ICS '01 Proceedings of the 15th international conference on Supercomputing
An analysis of operating system behavior on a simultaneous multithreaded architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dual path instruction processing
ICS '02 Proceedings of the 16th international conference on Supercomputing
Skipper: a microarchitecture for exploiting control-flow independence
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Design and Evaluation of Speculative Multi-threading with Selective Multi-Path Execution
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Limits of Task-Based Parallelism in Irregular Applications
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
The Case for Speculative Multithreading on SMT Processors
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Power-Aware Control Speculation through Selective Throttling
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Catching Accurate Profiles in Hardware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Implicitly-multithreaded processors
Proceedings of the 30th annual international symposium on Computer architecture
An evaluation of speculative instruction execution on simultaneous multithreaded processors
ACM Transactions on Computer Systems (TOCS)
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Control Flow Optimization Via Dynamic Reconvergence Prediction
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 19th annual international conference on Supercomputing
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm
IEEE Transactions on Parallel and Distributed Systems
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
IEEE Transactions on Computers
A case study of multi-threading in the embedded space
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Journal of Parallel and Distributed Computing
Speculative parallelization using state separation and multiple value prediction
Proceedings of the 2010 international symposium on Memory management
Dynamic branch prediction and control speculation
International Journal of High Performance Systems Architecture
International Journal of Parallel Programming
Hardware support for multithreaded execution of loops with limited parallelism
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Hi-index | 0.01 |
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simultaneous Multi-threading (SMT) processor to speculatively execute multiple paths of execution. When there are fewer threads in an SMT processor than hardware contexts, threaded multi-path execution uses spare contexts to fetch and execute code along the less likely path of hard-to-predict branches.This paper describes the hardware mechanisms needed to enable an SMT processor to efficiently spawn speculative threads for threaded multi-path execution. The Mapping Synchronization Bus is described, which enables the spawning of these multiple paths. Policies are examined for deciding which branches to fork, and for managing competition between primary and alternate path threads for critical resources. Our results show that TME increases the single program performance of an SMT with eight thread contexts by 14%-23% on average, depending on the misprediction penalty for programs with a high misprediction rate.