ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
The Superthreaded Processor Architecture
IEEE Transactions on Computers
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A static power model for architects
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Power-Aware Control Speculation through Selective Throttling
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Full chip leakage estimation considering power supply and temperature variations
Proceedings of the 2003 international symposium on Low power electronics and design
Improving Value Communication for Thread-Level Speculation
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Chip multiprocessors with speculative multithreading: design for performance and energy efficiency
Chip multiprocessors with speculative multithreading: design for performance and energy efficiency
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Proceedings of the 19th annual international conference on Supercomputing
Energy-Efficient Thread-Level Speculation
IEEE Micro
Scalable and reliable communication for hardware transactional memory
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Dependence-aware transactional memory for increased concurrency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dynamic performance tuning for speculative threads
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
International Journal of Parallel Programming
Dynamically dispatching speculative threads to improve sequential execution
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Chip Multiprocessors (CMP) with Thread-Level Speculation (TLS) have become the subject of intense research. However, TLS is suspected of being too energy inefficient to compete against conventional processors. In this paper, we refute this claim. To do so, we first identify the main sources of dynamic energy consumption in TLS. Then, we present simple energy-saving optimizations that cut the energy cost of TLS by over 60% on average with minimal performance impact. The resulting TLS CMP, populated with four 3-issue cores, speeds-up full SPECint 2000 codes by 1.27 on average, while keeping the fraction of the chip's energy consumption due to TLS to only 20%. Compared to a 6-issue superscalar at the same frequency, the TLS CMP is on average faster, while consuming only 85% of its total on-chip power.