Reducing wire delay penalty through value prediction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient Interconnects for Clustered Microarchitectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Improving dynamic cluster assignment for clustered trace cache processors
Proceedings of the 30th annual international symposium on Computer architecture
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
Application adaptive energy efficient clustered architectures
Proceedings of the 2004 international symposium on Low power electronics and design
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
IEEE Transactions on Parallel and Distributed Systems
Inherently Workload-Balanced Clustered Microarchitecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Cache organizations for clustered microarchitectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Proceedings of the 36th annual international symposium on Computer architecture
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Exploiting subtrace-level parallelism in clustered processors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Empowering a helper cluster through data-width aware instruction selection policies
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.00 |
In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the floating-point cluster is extended to execute simple integer instructions. With minor hardware modifications to a conventional superscalar, the issue width can potentially be doubled without increasing the hardware complexity. In fact, the result is a clustered architecture with two heterogeneous clusters.In this paper we propose to extend this architecture with a dynamic steering logic that sends the instructions to either cluster. The performance of clustered architectures depends on the inter-cluster communication overhead and the workload balance. We present a scheme that uses run-time information to optimise the trade-off between these figures. The evaluation shows that this scheme can achieve an average speed-up of 35% over a conventional 8-way issue (4 int + 4 fp) machine and that it outperforms the previous proposed one.