ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Hi-index | 0.00 |
Clustered processors lose performance as a result of clustering-induced stalls. Such stalls are the result of distributed resources and cluster communication delays. Our performance analysis of clustered architectures shows how previously proposed methods reduce one group of stalls at the expense of the other. Moreover, we extend previous work and present a new class of cluster assignment heuristics for high-performance clustered processors. We affirm that it is possible to improve performance in clustered processors by taking a more balanced approach towards clustering-induced stalls. Our techniques rely on estimating and predicting resource utilization for clustered processors. We show that, on average, our best technique reduces the performance gap between a dual-clustered and a centralized processor down to 6.9% and 9.2% for 8-way and 6-way processors and for a representative subset of SPEC2K benchmarks