Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Proceedings of the 19th annual international conference on Supercomputing
Thread-Level Speculation on a CMP can be energy efficient
Proceedings of the 19th annual international conference on Supercomputing
Hi-index | 0.00 |
While Chip Multiprocessors (CMP) with Speculative Multithreading (SM) support have been gaining momentum, experienced processor designers in industry have reservations about their practical implementation. SM CMPs must exploit multiple sources of speculative task-level parallelism, if they want to achieve enough performance improvement for non-numerical applications. Additionally, it is felt that SM is too energy-inefficient to compete against conventional superscalars. This thesis challenges for the first time the commonly-held view that SM consumes excessive energy. It shows a CMP with SM support that is not only faster but also more energy efficient than a state-of-the-art wide-issue superscalar. This is demonstrated with a new energy-efficient CMP micro-architecture. To achieve these results, this thesis is also the first one to propose micro-architectural mechanisms that, taken together, fundamentally enable fast SM with out-of-order spawn in a CMP. These simple mechanisms are: Splitting Timestamp Intervals, the Immediate Successor List, and Dynamic Task Merging. To evaluate them, we develop a SM compiler with and without out-of-order spawn. In addition, the thesis identifies the sources of energy consumption in SM, and proposes energy-centric optimizations that mitigate them. Experiments with the SpecInt 2000 codes show that a CMP with 4 3-issue cores and support for SM delivers a speedup of 1.27 over a 3-issue superscalar. The SM CMP is even faster than a 6-issue superscalar at the same frequency, and consumes only 85% of its power. In fact, for the same average power in both chips, the SM CMP is 1.13 times faster than the 6-issue superscalar on average.