An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
In massive tile-based multi-core architectures, it is important that the execution of the packets of a particular flow takes place in a set of cores physically close to each other in order to minimize the average latency to the common data structures across the local caches of the different cores. An static NUCA implementation provides a substrate for a cost-effective implementation of a cache sharing mechanism. However, a careful mapping of the different data structures in the system's memory, along with a smart load-balancing mechanism of the packets to the different cores, is fundamental in order to avoid long latencies to remote data. This work proposes a methodology for load balancing packets to cores in an S-NUCA tile-based architecture with a large number of cores.