Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Analysis of multithreaded architectures for parallel computing
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Limitations of cache prefetching on a bus-based multiprocessor
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Compiling for instruction cache performance on a multithreaded architecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Thread Partitioning and Value Prediction for Exploiting Speculative Thread-Level Parallelism
IEEE Transactions on Computers
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
Back-end assignment schemes for clustered multithreaded processors
Proceedings of the 18th annual international conference on Supercomputing
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor
Proceedings of the 31st annual international symposium on Computer architecture
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scalable cache memory design for large-scale SMT architectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive reorder buffers for SMT processors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Cache Miss Handling for High Memory-Level Parallelism
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Journal of Parallel and Distributed Computing
An L2-miss-driven early register deallocation for SMT processors
Proceedings of the 21st annual international conference on Supercomputing
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Resource area dilation to reduce power density in throughput servers
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies
IEEE Transactions on Computers
Optimising long-latency-load-aware fetch policies for SMT processors
International Journal of High Performance Computing and Networking
The shared-thread multiprocessor
Proceedings of the 22nd annual international conference on Supercomputing
Software-Controlled Priority Characterization of POWER5 Processor
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An adaptive resource partitioning algorithm for SMT processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing register pressure in SMT processors through L2-miss-driven early register release
ACM Transactions on Architecture and Code Optimization (TACO)
Hill-climbing SMT processor resource distribution
ACM Transactions on Computer Systems (TOCS)
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Per-thread cycle accounting in SMT processors
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors
Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
The impact of speculative execution on SMT processors
International Journal of Parallel Programming
Service level agreement for multithreaded processors
ACM Transactions on Architecture and Code Optimization (TACO)
Improving SMT performance: an application of genetic algorithms to configure resizable caches
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Enabling software management for multicore caches with a lightweight hardware support
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Probabilistic job symbiosis modeling for SMT processor scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Compiler techniques for reducing data cache miss rate on a multithreaded architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Data layout for cache performance on a multithreaded architecture
Transactions on high-performance embedded architectures and compilers III
Managing SMT resource usage through speculative instruction window weighting
ACM Transactions on Architecture and Code Optimization (TACO)
A phase adaptive cache hierarchy for SMT processors
Microprocessors & Microsystems
Enhancing ICOUNT2.8 fetch policy with better fairness for SMT processors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
A fetch policy maximizing throughput and fairness for two-context SMT processors
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Enhancing DCache warn fetch policy for SMT processors
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Improving GPU performance via large warps and two-level warp scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic modeling for job symbiosis scheduling on SMT processors
ACM Transactions on Architecture and Code Optimization (TACO)
Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors
Proceedings of the 49th Annual Design Automation Conference
FROCM: a fair and low-overhead method in SMT processor
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.01 |
Simultaneous multithreading architectures have been defined previously with fully shared execution resources. When one thread in such an architecture experiences a very long-latency operation, such as a load miss, the thread will eventually stall, potentially holding resources which other threads could be using to make forward progress.This paper shows that in many cases it is better to free the resources associated with a stalled thread rather than keep that thread ready to immediately begin execution upon return of the loaded data. Several possible architectures are examined, and some simple solutions are shown to be very effective, achieving speedups close to 6.0 in some cases, and averaging 15% speedup with four threads and over 100% speedup with two threads running. Response times are cut in half for several workloads in open system experiments.