Optimal pipelining in supercomputers
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Journal of Parallel and Distributed Computing
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Handling long-latency loads in a simultaneous multithreading processor
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor
IEEE Micro
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Picking Statistically Valid and Early Simulation Points
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Optimum Power/Performance Pipeline Depth
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
Understanding the energy efficiency of simultaneous multithreading
Proceedings of the 2004 international symposium on Low power electronics and design
Performance, Energy, and Thermal Considerations for SMT and CMP Architectures
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 14.98 |
Performance and power act as opposing constraints for optimal pipeline depth of a processor. While increasing the pipeline depth may enable performance improvement, the higher clock speed associated with a deeper pipeline also increases the power dissipation. As simultaneous multi-threading (SMT) becomes increasingly important for modern high-end processors, there is a need to quantify the optimal power-performance pipeline depth for SMT. While previous work has shown that SMT retains the performance-optimal pipeline depth in near-future technologies, this result does not take power into account. The intricate interplay between the relative impacts of changing pipeline depth on power and performance makes it difficult to predict the scaling trends for optimal SMT pipeline depths considering both power and performance. Using simulations, we quantify the optimal SMT pipeline depths based on the well-known power-performance metric PD3. Our analysis is novel and provides the following key results about the scaling trends for SMT pipelines considering both power and performance: (1) SMT has a deeper PD3-optimal pipeline as compared to superscalar. (2) The PD3-optimal SMT pipeline depth increases with an increase in the number of programs. (3) The PD3-optimal SMT pipeline becomes shallower with technology for a given number of programs.