ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Transactions on Computers
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
Computer
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Performance evaluation of exclusive cache hierarchies
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
POWER4 system microarchitecture
IBM Journal of Research and Development
Blue Gene/L compute chip: memory and Ethernet subsystem
IBM Journal of Research and Development
On the theory and potential of LRU-MRU collaborative cache management
Proceedings of the international symposium on Memory management
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Temporal-based multilevel correlating inclusive cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Cache memories currently treat all blocks as if they were equally important. This assumption of equally important blocks is not always valid. For instance, not all blocks deserve to be in L1 cache. We therefore propose globalized block placement. We present a global placement algorithm for managing blocks in a cache hierarchy by deciding where in the hierarchy an incoming block should be placed. Our technique makes decisions by adapting to access patterns of different blocks. The contributions of this paper are fourfold. First, we motivate our solution by demonstrating the importance of a globalized placement scheme. Second, we present a method to categorize cache block behavior into one of four categories. Third, we present one potential design exploiting this categorization. Finally, we demonstrate the performance of our design. The proposed scheme enhances overall system performance (IPC) by an average of 12% over a traditional LRU scheme while reducing traffic between L1 cache and L2 cache by an average of 20%, using SPEC CPU benchmark suite. All of this is achieved with a table as small as 3 KB.