The effect of speculatively updating branch history on branch prediction accuracy, revisited
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
Dynamic IPC/clock rate optimization
Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys (CSUR)
Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Positional adaptation of processors: application to energy reduction
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Multiscalar Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Dynamically Trading Frequency for Complexity in a GALS Microprocessor
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Previous studies have proposed techniques to dynamically change the architecture of a processor to better suit the characteristics of the workload at hand. However, all such approaches are prone to a fundamental trade-off between the architectural diversity they can provide and the latency of architectural change, their fixed-configuration performance and the complexity of finding the best architectural configuration for the workload at hand. In this study we argue that the full potential of dynamic architectural customization can only be achieved by diminishing the effect of the degree of available architectural diversity on the aforementioned performance factors. The performance of a statically designed processing core in a heterogeneous multi-core system is independent of the architectural diversity available. In addition, it is apparent that concurrent execution of code on differently architected cores automatically reveals which architecture is more suitable for the characteristics of a particular workload. We therefore propose architectural contesting; the redundant execution of code on a number of differently architected processors (each customized for a different set of workload characteristics) in a leader follower arrangement, such that the leader and follower cores continuously shift roles as one core or the other becomes more favorable for new code phases. In this manner effective execution is naturally transferred from one static architecture to the other with little latency. In this study, we show that the contesting of only processor width can yield an average speedup of 7.5% and up to 12.5% in integer SPEC benchmarks.