Software-Controlled Priority Characterization of POWER5 Processor

Authors:
Carlos Boneti;Francisco J. Cazorla;Roberto Gioiosa;Alper Buyuktosunoglu;Chen-Yong Cher;Mateo Valero
Affiliations:
-;-;-;-;-;-
Venue:
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Year:
2008

Citing 16
Cited 9

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A Single-Chip Multiprocessor

Computer
Power4 System Design for High Reliability

IEEE Micro
Boosting SMT Performance by Speculation Control

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
QoS for High-Performance SMT Processors in Embedded Systems

IEEE Micro
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Proceedings of the 33rd annual international symposium on Computer Architecture
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Predictable Performance in SMT Processors: Synergy between the OS and SMTs

IEEE Transactions on Computers
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
FAME: FAirly MEasuring Multithreaded Architectures

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
IBM POWER6 microarchitecture

IBM Journal of Research and Development
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro

A dynamic scheduler for balancing HPC applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Load balancing using dynamic cache allocation

Proceedings of the 7th ACM international conference on Computing frontiers
Power and thermal characterization of POWER6 system

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Probabilistic modeling for job symbiosis scheduling on SMT processors

ACM Transactions on Architecture and Code Optimization (TACO)
Enhancing the performance of assisted execution runtime systems through hardware/software techniques

Proceedings of the 26th ACM international conference on Supercomputing
Making data prefetch smarter: adaptive prefetching on POWER7

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Assessing the suitability of the NGMP multi-core processor in the space domain

Proceedings of the tenth ACM international conference on Embedded software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the limitations of instruction-level parallelism, thread-level parallelism has become a popular way to improve processor performance. One example is the IBM POWER5TM processor, a two-context simultaneous-multithreaded dual-core chip. In each SMT core, the IBM POWER5 features two levels of thread resource balancing and prioritization. The first level provides automatic in-hardware resource balancing, while the second level is a software-controlled priority mechanism that presents eight levels of thread priorities. Currently, software-controlled prioritization is only used in limited number of cases in the software platforms due to lack of performance characterization of the effects of this mechanism. In this work, we characterize the effects of the software-based prioritization on several different workloads. We show that the impact of the prioritization significantly depends on the workloads coscheduled on a core. By prioritizing the right task, it is possible to obtain more than two times of throughput improvement for synthetic workloads compared to the baseline. We also present two application case studies targeting two different performance metrics: the first case study improves overall throughput by 23.7% and the second case study reduces the total execution time by 9.3%. In addition, we show the circumstances when a background thread can be run transparently without affecting the performance of the foreground thread.