Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor
ICS '01 Proceedings of the 15th international conference on Supercomputing
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
Handling long-latency loads in a simultaneous multithreading processor
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
An L2-miss-driven early register deallocation for SMT processors
Proceedings of the 21st annual international conference on Supercomputing
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
Issue Mechanism for Embedded Simultaneous Multithreading Processor
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Hi-index | 0.00 |
Simultaneous multithreading (SMT) processors fetch instructions from several threads, increasing the available instruction level parallelism of each thread exposed to the processor. In an SMT the fetch engine decides which threads enter the processor and have priority in using resources. Hence, the fetch engine determines how shared resources are allocated, playing a key role in the final performance of the machine. When a thread experiences an L2 cache miss, critical resources can be monopolised for a long time, throttling the execution of remaining threads. Several approaches have been proposed to cope with this problem. The first contribution of this paper is the evaluation and comparison of the three best published policies addressing the long latency load problem. The second and main contributions of this paper are that we have proposed improved versions of these three policies. Our results show that the improved versions significantly enhance the original ones in both throughput and fairness.