Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
Proceedings of the 34th annual international symposium on Computer architecture
A Domain-Specific On-Chip Network Design for Large Scale Cache Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
IBM Journal of Research and Development
LP-NUCA: networks-in-cache for high-performance low-power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
To deal with the "memory wall" problem, microprocessors include large secondary on-chip caches. But as these caches enlarge, they originate a new latency gap between them and fast L1 caches (inter-cache latency gap). Recently, NonUniform Cache Architectures (NUCAs) have been proposed to sustain the size growth trend of secondary caches that is threatened by wire-delay problems. NUCAs are size-oriented, and they were not conceived to close the inter-cache latency gap. To tackle this problem, we propose Light NUCAs (L-NUCAs) leveraging on-chip wire density to interconnect small tiles through specialized networks, which convey packets with distributed and dynamic routing. Our design reduces the tile delay (cache access plus one-hop routing) to a single processor cycle and places cache lines at a finer granularity than conventional caches, reducing cache latency. Our evaluations show that in general, an L-NUCA improves simultaneously performance, energy, and area when integrated into both conventional or D-NUCA hierarchies.