Front-End Policies for Improved Issue Efficiency in SMT Processors

Authors:
Ali El-Moursy;David H. Albonesi
Affiliations:
-;-
Venue:
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Year:
2003

Citing 17
Cited 31

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
Circuits for wide-window superscalar processors

Proceedings of the 27th annual international symposium on Computer architecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Instruction flow-based front-end throttling for power-aware high-performance processors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Power-efficient issue queue design

Power aware computing
Power-Sensitive Multithreaded Architecture

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Early-stage definition of LPX: a low power issue-execute processor

PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems

Predictable performance in SMT processors

Proceedings of the 1st conference on Computing frontiers
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive reorder buffers for SMT processors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Predictable Performance in SMT Processors: Synergy between the OS and SMTs

IEEE Transactions on Computers
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread

Journal of Parallel and Distributed Computing
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Optimising long-latency-load-aware fetch policies for SMT processors

International Journal of High Performance Computing and Networking
Software-Controlled Priority Characterization of POWER5 Processor

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Energy-Efficient Simultaneous Thread Fetch from Different Cache Levels in a Soft Real-Time SMT Processor

SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
An adaptive resource partitioning algorithm for SMT processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Hill-climbing SMT processor resource distribution

ACM Transactions on Computer Systems (TOCS)
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors

Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
Issue Mechanism for Embedded Simultaneous Multithreading Processor

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
The impact of speculative execution on SMT processors

International Journal of Parallel Programming
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Thread-management techniques to maximize efficiency in multicore and simultaneous multithreaded microprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compatible phase co-scheduling on a CMP of multi-threaded processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Managing SMT resource usage through speculative instruction window weighting

ACM Transactions on Architecture and Code Optimization (TACO)
A phase adaptive cache hierarchy for SMT processors

Microprocessors & Microsystems
Enhancing ICOUNT2.8 fetch policy with better fairness for SMT processors

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A fetch policy maximizing throughput and fairness for two-context SMT processors

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
FROCM: a fair and low-overhead method in SMT processor

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.02

Visualization

Abstract

The performance and power optimization of dynamic superscalar microprocessors requires striking a careful balance between exploiting parallelism and hardware simplification. Hardware structures which are needlessly complex may exacerbate critical timing paths and dissipate extra power. One such structure requiring careful design is the issue queue. In a Simultaneous Multi-Threading (SMT) processor, it is particularly challenging to achieve issue queue simplification due to the increased utilization of the queue afforded by multi-threading.In this paper, we propose new front-end policies that reduce the required integer and floating point issue queue sizes in SMT processors. We explore both general policies as well as those directed towards alleviating a particular cause of issue queue inefficiency. For the same level of performance, the most effective policies reduce the issue queue occupancy by 33% for an SMT processor with appropriately-sized issue queue resources.