Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 14th international conference on Supercomputing
Circuits for wide-window superscalar processors
Proceedings of the 27th annual international symposium on Computer architecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Instruction flow-based front-end throttling for power-aware high-performance processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Handling long-latency loads in a simultaneous multithreading processor
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Power-efficient issue queue design
Power aware computing
Power-Sensitive Multithreaded Architecture
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Early-stage definition of LPX: a low power issue-execute processor
PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Proceedings of the 33rd annual international symposium on Computer Architecture
Adaptive reorder buffers for SMT processors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Predictable Performance in SMT Processors: Synergy between the OS and SMTs
IEEE Transactions on Computers
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing
An L2-miss-driven early register deallocation for SMT processors
Proceedings of the 21st annual international conference on Supercomputing
Optimising long-latency-load-aware fetch policies for SMT processors
International Journal of High Performance Computing and Networking
Software-Controlled Priority Characterization of POWER5 Processor
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
An adaptive resource partitioning algorithm for SMT processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing register pressure in SMT processors through L2-miss-driven early register release
ACM Transactions on Architecture and Code Optimization (TACO)
Hill-climbing SMT processor resource distribution
ACM Transactions on Computer Systems (TOCS)
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors
Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
Issue Mechanism for Embedded Simultaneous Multithreading Processor
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
The impact of speculative execution on SMT processors
International Journal of Parallel Programming
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compatible phase co-scheduling on a CMP of multi-threaded processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Managing SMT resource usage through speculative instruction window weighting
ACM Transactions on Architecture and Code Optimization (TACO)
A phase adaptive cache hierarchy for SMT processors
Microprocessors & Microsystems
Enhancing ICOUNT2.8 fetch policy with better fairness for SMT processors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A fetch policy maximizing throughput and fairness for two-context SMT processors
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
FROCM: a fair and low-overhead method in SMT processor
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.02 |
The performance and power optimization of dynamic superscalar microprocessors requires striking a careful balance between exploiting parallelism and hardware simplification. Hardware structures which are needlessly complex may exacerbate critical timing paths and dissipate extra power. One such structure requiring careful design is the issue queue. In a Simultaneous Multi-Threading (SMT) processor, it is particularly challenging to achieve issue queue simplification due to the increased utilization of the queue afforded by multi-threading.In this paper, we propose new front-end policies that reduce the required integer and floating point issue queue sizes in SMT processors. We explore both general policies as well as those directed towards alleviating a particular cause of issue queue inefficiency. For the same level of performance, the most effective policies reduce the issue queue occupancy by 33% for an SMT processor with appropriately-sized issue queue resources.