Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
The TLB slice—a low-cost high-speed address translation mechanism
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Transactions on Computers
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Counter-Based Cache Replacement and Bypassing Algorithms
IEEE Transactions on Computers
Towards practical page coloring-based multicore cache management
Proceedings of the 4th ACM European conference on Computer systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Enhancing last-level cache performance by block bypassing and early miss determination
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.00 |
Recently, improving hardware cache performance is getting more important, because the performance gap between processor and memory has caused "memory wall" problem. Most cache designs are based on the LRU replacement policy which is effective for high-locality workloads. However, it is ineffective for the workloads that have a working set greater than available cache size or weak-memory access patterns. To make up for the weakness of LRU policy, we introduce a novel code-based cache partitioning mechanism which does not require any hardware support. In our mechanism, we first collect profile data using binary instrumentation, and then classify the characteristic of code region through the collected code profiles. Finally, while the application is running, page coloring technique is used for code-based cache partitioning. To show effectiveness of our mechanism, we implemented our mechanism in the Linux kernel. Experiments on the workloads including weak-memory access pattern show that the proposed mechanism achieves performance improvement by up to 7.3% and the last-level cache miss reduction by up to 37.8%.