Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Partitioned first-level cache design for clustered microarchitectures
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Microarchitectural Trade-Offs in the Design of a Scalable Clustered Microprocessor
Microarchitectural Trade-Offs in the Design of a Scalable Clustered Microprocessor
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
Hi-index | 0.00 |
Cache memories are commonly implemented through multiple memory banks to improve bandwidth and latency. The early knowledge of the data cache bank that an instruction will access can help to improve the performance in several ways. One scenario that is likely to become increasingly important is clustered microprocessors with a distributed cache. This work presents a study of different cache bank predictors. We show that effective bank predictors can be implemented with relatively low cost. For instance, a predictor of approximately 4 Kbytes is shown to achieve an average hit rate of 78% for SPECint2000 when used to predict accesses to an 8-bank cache memory in a contemporary superscalar processor. We also show how a predictor can be used to reduce the communication latency caused by memory accesses in a clustered microarchitecture with a distributed cache design.