Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Authors:
Seungryul Choi;Donald Yeung
Affiliations:
University of Maryland;University of Maryland
Venue:
Proceedings of the 33rd annual international symposium on Computer Architecture
Year:
2006

Citing 16
Cited 24

Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Design and evaluation of compiler algorithms for pre-execution

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Boosting SMT Performance by Speculation Control

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Study of a Simultaneous Multithreaded Processor Implementation

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Front-End Policies for Improved Issue Efficiency in SMT Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
The Impact of Resource Partitioning on SMT Processors

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Back-end assignment schemes for clustered multithreaded processors

Proceedings of the 18th annual international conference on Supercomputing
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro

An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Software-Controlled Priority Characterization of POWER5 Processor

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
An adaptive resource partitioning algorithm for SMT processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Hill-climbing SMT processor resource distribution

ACM Transactions on Computer Systems (TOCS)
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
The impact of speculative execution on SMT processors

International Journal of Parallel Programming
The Impact of Resource Sharing Control on the Design of Multicore Processors

ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A Predictive Model for Dynamic Microarchitectural Adaptivity Control

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
On the power management of simultaneous multithreading processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
Managing SMT resource usage through speculative instruction window weighting

ACM Transactions on Architecture and Code Optimization (TACO)
Probabilistic modeling for job symbiosis scheduling on SMT processors

ACM Transactions on Architecture and Code Optimization (TACO)
Self-aware computing in the Angstrom processor

Proceedings of the 49th Annual Design Automation Conference
Making data prefetch smarter: adaptive prefetching on POWER7

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
L1-bandwidth aware thread allocation in multicore SMT processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processors

Computers and Electrical Engineering
Dynamic microarchitectural adaptation using machine learning

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The key to high performance in Simultaneous Multithreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss counts, and take actions to try to alleviate them. While the corrective actions are designed to improve performance, their actual performance impact is not known since end performance is never monitored. Consequently, potential performance gains are lost whenever the corrective actions do not effectively address the actual bottlenecks occurring in the pipeline. We propose a different approach to SMT resource distribution that optimizes end performance directly. Our approach observes the impact that resource distribution decisions have on performance at runtime, and feeds this information back to the resource distribution mechanisms to improve future decisions. By evaluating many different resource distributions, our approach tries to learn the best distribution over time. Because we perform learning on-line, learning time is crucial. We develop a hill-climbing algorithm that efficiently learns the best distribution of resources by following the performance gradient within the resource distribution space. This paper conducts an in-depth investigation of learningbased SMT resource distribution. First, we compare existing resource distribution techniques to an ideal learning-based technique that performs learning off-line. This limit study shows learning-based techniques can provide up to 19.2% gain over ICOUNT, 18.0% gain over FLUSH, and 7.6% gain over DCRA across 21 multithreaded workloads. Then, we present an on-line learning algorithm based on hill-climbing. Our evaluation shows hill-climbing provides a 12.4% gain over ICOUNT, 11.3% gain over FLUSH, and 2.4% gain over DCRA across a larger set of 42 multiprogrammed workloads.