Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for achieving improved processor throughput
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient fair queueing using deficit round-robin
IEEE/ACM Transactions on Networking (TON)
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
CPU reservations and time constraints: efficient, predictable scheduling of independent activities
Proceedings of the sixteenth ACM symposium on Operating systems principles
Simultaneous multithreading: maximizing on-chip parallelism
25 years of the international symposia on Computer architecture (selected papers)
Proceedings of the seventeenth ACM symposium on Operating systems principles
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Handling long-latency loads in a simultaneous multithreading processor
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A stateless, content-directed data prefetching mechanism
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Evaluation of Multithreaded Processors and Thread-Switch Policies
ISHPC '97 Proceedings of the International Symposium on High Performance Computing
A Multithreaded Processor Designed for Distributed Shared Memory Systems
APDC '97 Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Power-Sensitive Multithreaded Architecture
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
The Impact of Resource Partitioning on SMT Processors
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Prophet/Critic Hybrid Branch Prediction
Proceedings of the 31st annual international symposium on Computer architecture
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Chip Multithreading: Opportunities and Challenges
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
Perceptron-Based Branch Confidence Estimation
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The Cell Processor Architecture
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
uComplexity: Estimating Processor Design Effort
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous Chip Multiprocessors
Computer
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A light-weight fairness mechanism for chip multiprocessor memory systems
Proceedings of the 6th ACM conference on Computing frontiers
Service level agreement for multithreaded processors
ACM Transactions on Architecture and Code Optimization (TACO)
A case for bufferless routing in on-chip networks
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Prefetch-aware shared resource management for multi-core systems
Proceedings of the 38th annual international symposium on Computer architecture
ACM Transactions on Computer Systems (TOCS)
FROCM: a fair and low-overhead method in SMT processor
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Shrinking l1 instruction caches to improve energy: delay in SMT embedded processors
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
REF: resource elasticity fairness with sharing incentives for multiprocessors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
The need to reduce power and complexity will increase the interest in Switch on Event multithreading (coarse grained multithreading). Switch On Event multithreading is a low power and low complexity mechanism to improve processor throughput by switching threads on execution stalls. Fairness may, however, become a problem in a mul- tithreaded processor. Unless fairness is properly handled, some threads may starve while others consume all of the processor cycles. Heuristics that were devised in order to improve fairness in Simultaneous Multithreading are not applicable to Switch On Event multithreading. This paper defines the fairness metric using the ratio of the individ- ual threads' speedups, and shows how it can be enforced in Switch On Event multithreading. Fairness is controlled by forcing additional thread switch points. These switch points are determined dynamically by runtime estimation of the single threaded performance of each of the individ- ual threads. We analyze the impact of the fairness enforce- ment mechanism on throughput. We present simulation re- sults of the performance of Switch on Event multithread- ing. Switch on Event multithreading achieves an average speedup over single thread of 24% when no fairness is en- forced. In this case, over a third of our runs achieved poor fairness in which one thread ran extremely slowly (10 to 100 times slower than its single thread performance) while the other thread's performance was hardly affected. By using the proposed mechanism we can guarantee fairness of 1/4, 1/2 and 1 for a small performance loss of 2.2%, 3.7% and 7.2% respectively.