Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Correlated load-address predictors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Composable Lightweight Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Dynamic heterogeneity and the need for multicore virtualization
ACM SIGOPS Operating Systems Review
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 7th ACM international conference on Computing frontiers
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
Hi-index | 0.00 |
Executing sequential program on multi-core is crucial for accommodating Instruction Level Parallelism (ILP) in Chip Multi-Processor (CMP) architecture. One widely used method for steering instructions across cores is based on dependency. However, this method requires a sophisticated steering mechanism and brings about much hardware complexity and die area overhead. This paper presents the Global Register Alias Table (GRAT), a structure which can be used in CMP architecture to facilitate sequential program execution across cores. The GRAT drastically reduces the area overhead and design complexity of steering instructions without introducing additional programming effort or compiler support. Dynamic reconfiguration is also implemented to support efficient parallel program execution. In our evaluation, the result shows that our work performs within 5.9% of Core Fusion, a recent work which requires a complex steering unit.