An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Scalable hardware support for conditional parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Earlier studies on Simultaneous Multithreaded (SMT) Architectures showed that performance of a realistic SMT architecture saturates early. This paper addresses our contention that a fixed hardware thread scheduling strategy cannot provide optimal results for various thread combinations. We propose an approach that partially schedules threads in the form of a detector thread at a nominal hardware and software cost. It offers the capability to adaptively switch thread scheduling policies depending on various situations. This article shows that there is much room for performance improvement for our adaptive dynamic thread scheduling approach. The results obtained by simulating a realistic SMT architecture show that 27% is approximately the upper-bound of the performance improvement for SMT with eight contexts. This demonstrates that our approach may significantly improve performance with good low-throughput detection and fetch policy selection heuristics.