Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Handling long-latency loads in a simultaneous multithreading processor
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
In Simultaneous Multithreading (SMT) processors, co-scheduled threads share the processor's resources, but at the same time compete for them. A thread missing in L2 cache may occupy most of available resources for a long time, causing other threads run slower than they could or even stall because of lack of resources. As a result, the overall performance of SMT processors is degraded. In this paper, we propose a novel fetch policy called MFP (Multiple Fetch Priorities) to prevent the negative effects caused by L2 cache misses. In our policy, there are three fetch priority levels for each thread and threads are assigned different fetch priority based on their cache behaviors. Each cycle, MFP fetches instructions from the threads with the highest priority. Results show that our policy outperforms previously proposed fetch policies for all types of workloads, especially for memory bounded workloads, whether using IPC as a metric or using the harmonic mean as a metric. Results also tell that our policy shows different degrees of improvement over other fetch policies. The increment over PDG is greatest, reaching 19.2% in throughput and 27.7% in Hmean on average.