The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimal replacements in caches with two miss costs
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Page replacement for general caching problems
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
ACM Computing Surveys (CSUR)
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
The Effectiveness of SRAM Network Caches in Clustered DSMs
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Cost-Sensitive Cache Replacement Algorithms
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Cost-sensitive cache replacement algorithms
Cost-sensitive cache replacement algorithms
Simple penalty-sensitive replacement policies for caches
Proceedings of the 3rd conference on Computing frontiers
Cost-aware WWW proxy caching algorithms
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
A Cost-Aware Strategy for Query Result Caching in Web Search Engines
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Automatically constructing trusted cluster computing environment
The Journal of Supercomputing
Cost-Aware Strategies for Query Result Caching in Web Search Engines
ACM Transactions on the Web (TWEB)
Power- and time-aware buffer cache management for real-time embedded databases
Journal of Systems Architecture: the EUROMICRO Journal
GPC'12 Proceedings of the 7th international conference on Advances in Grid and Pervasive Computing
Hi-index | 14.99 |
Cache replacement algorithms originally developed in the context of uniprocessors executing one instruction at a time implicitly assume that all cache misses have the same cost. However, in modern systems, some cache misses are more expensive than others. The cost may be latency, penalty, power consumption, bandwidth consumption, or any other ad hoc numerical property attached to a miss. We call the class of replacement algorithms designed to minimize a nonuniform miss cost function "cost-sensitive replacement algorithms.” In this paper, we first introduce and analyze an optimum cost-sensitive replacement algorithm (CSOPT) in the context of multiple nonuniform miss costs. CSOPT can significantly improve the cost function over OPT (the replacement algorithm minimizing miss count) in large regions of the design space. Although CSOPT is an offline and unrealizable replacement policy, it serves as a lower bound on the achievable cost by realistic cost-sensitive replacement algorithms. Using the practical example of latency cost in CC-NUMA multiprocessors, we demonstrate that there is a lot of room left to improve current replacement algorithms in many situations beyond the promise of OPT. Next, we introduce three practical extensions of LRU inspired by CSOPT and we compare their performance to LRU, OPT, and CSOPT. Finally, as a practical application, we evaluate these realizable cost-sensitive replacement algorithms in the context of the second-level caches of a CC-NUMA multiprocessor with superscalar processors, using the miss latency as the cost function. By applying simple replacement policies sensitive to the latency of misses, we can improve the execution time of some parallel applications by up to 18 percent.