The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Low-swing interconnect interface circuits
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Reusability-aware cache memory sharing for chip multiprocessors with private L2 caches
Journal of Systems Architecture: the EUROMICRO Journal
Cache topology aware computation mapping for multicores
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 0.00 |
Chip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache memories. To meet the performance demand and power budget, an efficient support for memory hierarchy is important. We propose an on-chip L2 cache organization which takes advantage of both a private L2 cache and a shared L2 cache to improve the performance and reduce energy consumption. Our L2 cache organization is based on a private L2 cache organization which has the short access latency. When a cache block in the private L2 cache is selected for an eviction, our proposed organization first evaluates the reusability of the cache block. If the cache block is likely to be reused, we save the evicted cache block in one of peer L2 caches which may have efficiently invalid blocks. By selectively writing evicted cache blocks to peer L2 caches, the proposed L2 cache organization can effectively simulate a shared L2 cache. Experimental results using a CMP simulator showed that the proposed L2 cache organization improved the average memory latency by up to 27% and reduced energy consumption by up to 16.6% over a 256KB private L2 cache organization for the SPLASH2 benchmark programs..