Optimising long-latency-load-aware fetch policies for SMT processors

Authors:
Francisco J. Cazorla;Alex Ramirez;Mateo Valero;Enrique Fernandez
Affiliations:
Department of Computer Architecture, UPC, Jodi Girona 1-3, Barcelona D6. 08034, Spain.;Department of Computer Architecture, UPC, Jodi Girona 1-3, Barcelona D6. 08034, Spain.;Department of Computer Architecture, UPC, Jodi Girona 1-3, Barcelona D6. 08034, Spain.;University of Las Palmas de Gran Canaria, Departamento de Informatica, y Sistemas Campus Universidad de Tafira, Las Palmas de Gran Canaria 35017, Spain
Venue:
International Journal of High Performance Computing and Networking
Year:
2004

Citing 10
Cited 4

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor

ICS '01 Proceedings of the 15th international conference on Supercomputing
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture

An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
Issue Mechanism for Embedded Simultaneous Multithreading Processor

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Simultaneous multithreading (SMT) processors fetch instructions from several threads, increasing the available instruction level parallelism of each thread exposed to the processor. In an SMT the fetch engine decides which threads enter the processor and have priority in using resources. Hence, the fetch engine determines how shared resources are allocated, playing a key role in the final performance of the machine. When a thread experiences an L2 cache miss, critical resources can be monopolised for a long time, throttling the execution of remaining threads. Several approaches have been proposed to cope with this problem. The first contribution of this paper is the evaluation and comparison of the three best published policies addressing the long latency load problem. The second and main contributions of this paper are that we have proposed improved versions of these three policies. Our results show that the improved versions significantly enhance the original ones in both throughput and fairness.