Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Widening resources: a cost-effective technique for aggressive ILP architectures
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Increasing effective IPC by exploiting distant parallelism
ICS '99 Proceedings of the 13th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Techniques for Multicore Thermal Management: Classification and New Exploration
Proceedings of the 33rd annual international symposium on Computer Architecture
Multiple Instruction Stream Processor
Proceedings of the 33rd annual international symposium on Computer Architecture
Serialization-Aware Mini-Graphs: Performance with Fewer Resources
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Rhythm: harnessing data parallel hardware for server workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
This work proposes Thread Fusion as an effective way of reducing power consumption when a Simultaneous Multi-Threaded (SMT) core is executing two threads from a homogeneous parallel application. Two dynamic instances of the same static instruction, each from a different thread are merged (fused) into a single instruction, consuming half of the resources of front-end pipeline stages. When the fused instruction is executed, it is cloned and it proceeds at full bandwidth. Our simulation results show average energy reduction of 10% with less than 1% impact on performance.