Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Computer
Area and System Clock Effects on SMT/CMP Processors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Discovering and Exploiting Program Phases
IEEE Micro
Hi-index | 0.00 |
Program behaviors reveal that programs have different sources requirement at different phases, even at continuous clocks It is not a reasonable way to run different programs on constant hardware resources So sharing feasible degree of hardware may get more benefits for programs. This paper proposes architecture to share function units between neighbor cores in CMP to improve chip performance Function units are central units on the core, it take little area and is not the performance critical part of core, but improving function units' utilization can improve other units' efficiency and core performance In our design, share priority guarantees the local thread would not be influenced by threads in neighbor cores Share latency is resolved by early share decision made and direct data path The evaluation shows that the proposal is good for function unit intensive program and can drive other units more efficient.