ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Skipper: a microarchitecture for exploiting control-flow independence
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Thread-Spawning Schemes for Speculative Multithreading
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 19th annual international conference on Supercomputing
An embedded coherent-multithreading multimedia processor and its programming model
Proceedings of the 44th annual Design Automation Conference
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Journal of Parallel and Distributed Computing
IEEE Transactions on Circuits and Systems for Video Technology
International Journal of Parallel Programming
Supporting speculative multithreading on simultaneous multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
SCIN-cache: Fast speculative versioning in multithreaded cores
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
This paper proposes the Implicitly-MultiThreaded (IMT) architecture to execute compiler-specified speculative threads on to a modified Simultaneous Multithreading pipeline. IMT reduces hardware complexity by relying on the compiler to select suitable thread spawning points and orchestrate inter-thread register communication. To enhance IMT's effectiveness, this paper proposes three novel microarchitectural mechanisms: (1) resource- and dependence-based fetch policy to fetch and execute suitable instructions, (2) context multiplexing to improve utilization and map as many threads to a single context as allowed by availability of resources, and (3) early threadinvocation to hide thread start-up overhead by overlapping one thread's invocation with other threads' execution.We use SPEC2K benchmarks and cycle-accurate simulation to show that an microarchitecture-optimized IMT improves performance on average by 24% and at best by 69% over an aggressive superscalar. We also compare IMT to two prior proposals, TME and DMT, for speculative threading on an SMT using hardware-extracted threads. Our best IMT design outperforms a comparable TME and DMT on average by 26% and 38% respectively.