An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
ET2: a metric for time and energy efficiency of computation
Power aware computing
Energy efficient CMOS microprocessor design
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Out-of-Order Execution may not be Cost-Effective on Processors Featuring Simultaneous Multithreading
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies
IEEE Transactions on Computers
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic vectorization in the E2 dynamic multicore architecture
ACM SIGARCH Computer Architecture News
Erasing Core Boundaries for Robust and Configurable Performance
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Bahurupi: A polymorphic heterogeneous multi-core architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Integrated analysis of power and performance for pipelined microprocessors
IEEE Transactions on Computers
Flicker: a dynamically adaptive architecture for power limited multicore systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Enabling fair pricing on HPC systems with node sharing
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
When slower is faster: on heterogeneous multicores for reliable systems
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
The sharing architecture: sub-core configurability for IaaS clouds
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Several researchers have recognized in recent years that today's workloads require a micro architecture that can handle single-threaded code at high performance, and multi-threaded code at high throughput, while consuming no more energy than is necessary. This paper proposes Morph Core, a unique approach to satisfying these competing requirements, by starting with a traditional high performance out-of-order core and making minimal changes that can transform it into a highly-threaded in-order SMT core when necessary. The result is a micro architecture that outperforms an aggressive 4-way SMT out-of-order core, "medium" out-of-order cores, small in-order cores, and Core Fusion. Compared to a 2-way SMT out-of-order core, Morph Core increases performance by 10% and reduces energy-delay-squared product by 22%.