The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Leakage Energy Management in Cache Hierarchies
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches
Proceedings of the 2004 international symposium on Low power electronics and design
IATAC: a smart predictor to turn-off L2 cache lines
ACM Transactions on Architecture and Code Optimization (TACO)
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
The reuse cache: downsizing the shared last-level cache
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Last level caches (LLC) play an important role in current and future chip multiprocessors, since they constitute the last opportunity to avoid expensive off-chip accesses. In a tiled CMP, the LLC is typically shared by all cores but physically distributed along the chip, thus providing a global banked capacity memory with high associativity. The memory hierarchy is orchestrated through a directory-based coherence protocol, typically associated to the LLC banks. The LLC (and directory structure) occupies a significant chip area and has a large contribution on the global chip leakage energy. To counter measure these effects, we provide in this paper a reorganization of the LLC cache and the directory by decoupling tag and data entry allocation, and by exploiting the high percentage of private data typically found in CMP systems. Private blocks are kept in L1 caches whereas LLC area is reorganized to reduce L2 entries while still allowing directory entries for private data, thus, maximizing on-chip memory reuse. This is achieved with no performance drop in terms of execution time. Evaluation results demonstrate a negligible impact on performance while achieving 45% of area saving and 75% of static power saving. For more aggressive designs, we achieve 80% area and 82% static power savings, while impacting performance by 10%.