Fairness and Throughput in Switch on Event Multithreading

Authors:
Ron Gabor;Shlomo Weiss;Avi Mendelson
Affiliations:
Tel Aviv University, Israel;Tel Aviv University, Israel;Intel, Israel
Venue:
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2006

Citing 36
Cited 10

Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for achieving improved processor throughput

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The effectiveness of multiple hardware contexts

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient fair queueing using deficit round-robin

IEEE/ACM Transactions on Networking (TON)
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
CPU reservations and time constraints: efficient, predictable scheduling of independent activities

Proceedings of the sixteenth ACM symposium on Operating systems principles
Simultaneous multithreading: maximizing on-chip parallelism

25 years of the international symposia on Computer architecture (selected papers)
Borrowed-virtual-time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler

Proceedings of the seventeenth ACM symposium on Operating systems principles
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A stateless, content-directed data prefetching mechanism

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Evaluation of Multithreaded Processors and Thread-Switch Policies

ISHPC '97 Proceedings of the International Symposium on High Performance Computing
A Multithreaded Processor Designed for Distributed Shared Memory Systems

APDC '97 Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Power-Sensitive Multithreaded Architecture

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
The Impact of Resource Partitioning on SMT Processors

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Prophet/Critic Hybrid Branch Prediction

Proceedings of the 31st annual international symposium on Computer architecture
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Montecito: A Dual-Core, Dual-Thread Itanium Processor

IEEE Micro
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
The Impact of Performance Asymmetry in Emerging Multicore Architectures

Proceedings of the 32nd annual international symposium on Computer Architecture
Perceptron-Based Branch Confidence Estimation

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Multi-Core to the Masses

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The Cell Processor Architecture

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
uComplexity: Estimating Processor Design Effort

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous Chip Multiprocessors

Computer
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
A multithreaded PowerPC processor for commercial servers

IBM Journal of Research and Development
POWER4 system microarchitecture

IBM Journal of Research and Development

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A light-weight fairness mechanism for chip multiprocessor memory systems

Proceedings of the 6th ACM conference on Computing frontiers
Service level agreement for multithreaded processors

ACM Transactions on Architecture and Code Optimization (TACO)
A case for bufferless routing in on-chip networks

Proceedings of the 36th annual international symposium on Computer architecture
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

ACM Transactions on Computer Systems (TOCS)
FROCM: a fair and low-overhead method in SMT processor

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Shrinking l1 instruction caches to improve energy: delay in SMT embedded processors

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
REF: resource elasticity fairness with sharing incentives for multiprocessors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need to reduce power and complexity will increase the interest in Switch on Event multithreading (coarse grained multithreading). Switch On Event multithreading is a low power and low complexity mechanism to improve processor throughput by switching threads on execution stalls. Fairness may, however, become a problem in a mul- tithreaded processor. Unless fairness is properly handled, some threads may starve while others consume all of the processor cycles. Heuristics that were devised in order to improve fairness in Simultaneous Multithreading are not applicable to Switch On Event multithreading. This paper defines the fairness metric using the ratio of the individ- ual threads' speedups, and shows how it can be enforced in Switch On Event multithreading. Fairness is controlled by forcing additional thread switch points. These switch points are determined dynamically by runtime estimation of the single threaded performance of each of the individ- ual threads. We analyze the impact of the fairness enforce- ment mechanism on throughput. We present simulation re- sults of the performance of Switch on Event multithread- ing. Switch on Event multithreading achieves an average speedup over single thread of 24% when no fairness is en- forced. In this case, over a third of our runs achieved poor fairness in which one thread ran extremely slowly (10 to 100 times slower than its single thread performance) while the other thread's performance was hardly affected. By using the proposed mechanism we can guarantee fairness of 1/4, 1/2 and 1 for a small performance loss of 2.2%, 3.7% and 7.2% respectively.