Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Dynamic IPC/clock rate optimization
Proceedings of the 25th annual international symposium on Computer architecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic code partitioning for clustered architectures
International Journal of Parallel Programming - parallel architectures and compilation techniques, part II
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Handling long-latency loads in a simultaneous multithreading processor
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Tradeoffs in power-efficient issue queue design
Proceedings of the 2002 international symposium on Low power electronics and design
The Alpha 21264 Microprocessor
IEEE Micro
Partitioned first-level cache design for clustered microarchitectures
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Improving dynamic cluster assignment for clustered trace cache processors
Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Complexity-effective superscalar processors
Complexity-effective superscalar processors
Inherently lower-power high-performance superscalar architectures
Inherently lower-power high-performance superscalar architectures
The Impact of Resource Partitioning on SMT Processors
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Understanding the energy efficiency of SMT and CMP with multiclustering
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Proceedings of the 33rd annual international symposium on Computer Architecture
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Addressing thermal nonuniformity in SMT workloads
ACM Transactions on Architecture and Code Optimization (TACO)
Hill-climbing SMT processor resource distribution
ACM Transactions on Computer Systems (TOCS)
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Proceedings of the 36th annual international symposium on Computer architecture
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Hi-index | 0.00 |
Power consumption and wire delays are two important limiting factors for current and forthcoming processors. Monolithic designs that keep reasonable power consumption and operate at high clock frequencies are ever harder to implement. In this paper we propose a novel multithreaded clustered microarchitecture that consists of a clustered front-end capable of fetching instructions from multiple hreads and a clustered back-end where instructions are executed. This microarchitecture combines the concepts of multithreading and clustering to a tack both problems: power consumption and wire delays. A key aspect of this microarchitecture is the assignment of resources to the simultaneously running threads. We propose two back-end assignment schemes; in the Static Back-end Assignment (SBA)the back-ends are statically assigned to the front-ends, while in the Dynamic Back-end Assignment (DBA) the back-ends are dynamically assigned according to the demands of each front-end. A limit study of the potential performance of DBA shows a minor benefit compared to SBA. The causes why the DBA scheme does not perform as initially expected are investigated and the main limiting factors of this architecture are evaluated. Finally,we point out he advantages of DBA versus SBA.