Improving dynamic cluster assignment for clustered trace cache processors

Authors:
Ravi Bhargava;Lizy K. John
Affiliations:
The University of Texas at Austin;The University of Texas at Austin
Venue:
Proceedings of the 30th annual international symposium on Computer architecture
Year:
2003

Citing 12
Cited 6

The multiscalar architecture

The multiscalar architecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Inherently Lower-Power High-Performance Superscalar Architectures

IEEE Transactions on Computers
Latency and energy aware value prediction for high-frequency processors

ICS '02 Proceedings of the 16th international conference on Supercomputing
Efficient Interconnects for Clustered Microarchitectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Instruction Pre-Processing in Trace Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
A Cost-Effective Clustered Architecture

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters

Back-end assignment schemes for clustered multithreaded processors

Proceedings of the 18th annual international conference on Supercomputing
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors

IEEE Transactions on Parallel and Distributed Systems
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work examines dynamic cluster assignment for a clustered trace cache processor (CTCP). Previously proposed cluster assignment techniques run into unique problems as issue width and cluster count increase. Realistic design conditions, such as variable data forwarding latencies between clusters and a heavily partitioned instruction window, increase the degree of difficulty for effective cluster assignment.In this work, the trace cache and fill unit are used to perform dynamic cluster assignment. The retire-time fill unit analysis is aided by a dynamic profiling mechanism embedded within the trace cache. This mechanism provides information about inter-trace data dependencies, an element absent in previous retire-time CTCP cluster assignment work. The strategy proposed in this work leads to more intra-cluster data forwarding and shorter data forwarding distances. In addition, performing cluster assignment at retire time reduces issue-time complexity and eliminates early pipeline stages. This increases overall performance for integer programs by 11.5% over our base CTCP architecture. This speedup is significantly higher than a previously proposed retire-time CTCP assignment strategy. Dynamic cluster assignment is also evaluated for several alternate cluster designs as well as for media benchmarks.