A Cost-Effective Clustered Architecture

Authors:
Ramon Canal;Joan-Manuel Parcerisa;Antonio Gonzalez
Affiliations:
-;-;-
Venue:
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Year:
1999

Citing 0
Cited 17

Reducing wire delay penalty through value prediction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient Interconnects for Clustered Microarchitectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Temperature-aware microarchitecture

Proceedings of the 30th annual international symposium on Computer architecture
Improving dynamic cluster assignment for clustered trace cache processors

Proceedings of the 30th annual international symposium on Computer architecture
Temperature-aware microarchitecture: Modeling and implementation

ACM Transactions on Architecture and Code Optimization (TACO)
Application adaptive energy efficient clustered architectures

Proceedings of the 2004 international symposium on Low power electronics and design
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures

IEEE Transactions on Parallel and Distributed Systems
Inherently Workload-Balanced Clustered Microarchitecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Cache organizations for clustered microarchitectures

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Proceedings of the 36th annual international symposium on Computer architecture
WiDGET: Wisconsin decoupled grid execution tiles

Proceedings of the 37th annual international symposium on Computer architecture
Exploiting subtrace-level parallelism in clustered processors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Empowering a helper cluster through data-width aware instruction selection policies

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In current superscalar processors, all floating-point resources are idle during the execution of integer programs. As previous works show, this problem can be alleviated if the floating-point cluster is extended to execute simple integer instructions. With minor hardware modifications to a conventional superscalar, the issue width can potentially be doubled without increasing the hardware complexity. In fact, the result is a clustered architecture with two heterogeneous clusters.In this paper we propose to extend this architecture with a dynamic steering logic that sends the instructions to either cluster. The performance of clustered architectures depends on the inter-cluster communication overhead and the workload balance. We present a scheme that uses run-time information to optimise the trade-off between these figures. The evaluation shows that this scheme can achieve an average speed-up of 35% over a conventional 8-way issue (4 int + 4 fp) machine and that it outperforms the previous proposed one.