Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Reducing wire delay penalty through value prediction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An interleaved cache clustered VLIW processor
ICS '02 Proceedings of the 16th international conference on Supercomputing
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
The Alpha 21264 Microprocessor
IEEE Micro
Efficient Interconnects for Clustered Microarchitectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Partitioned first-level cache design for clustered microarchitectures
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Cost-Effective Clustered Architecture
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
ICCD '03 Proceedings of the 21st International Conference on Computer Design
A Criticality Analysis of Clustering in Superscalar Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Independent front-end and back-end dynamic voltage scaling for a GALS microarchitecture
Proceedings of the 2006 international symposium on Low power electronics and design
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Hi-index | 0.00 |
Clustered microarchitectures are an effective organization to deal with the problem of wire delays and complexity by partitioning some of the processor resources. The organization of the data cache is a key factor in these processors due to its effect on cache miss rate and inter-cluster communications. This paper investigates alternative designs of the data cache: centralized, distributed, replicated and physically distributed cache architectures are analyzed. Results show similar average performance but significant performance variations depending on the application features, specially cache miss ratio and communications. In addition, we also propose a novel instruction steering scheme in order to reduce communications. This scheme conditionally stalls the dispatch of instructions depending on the occupancy of the clusters, whenever the current instruction cannot be steered to the cluster holding most of the inputs. This new steering outperforms traditional schemes. Results show, an average speedup of 5% and up to 15% for some applications.