Locality-Aware Process Scheduling for Embedded MPSoCs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Eliminating Conflict Misses Using Prime Number-Based Cache Indexing
IEEE Transactions on Computers
Predicting Cache Space Contention in Utility Computing Servers
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Skewed caches from a low-power perspective
Proceedings of the 2nd conference on Computing frontiers
A case for a working-set-based memory hierarchy
Proceedings of the 2nd conference on Computing frontiers
The V-Way Cache: Demand Based Associativity via Global Replacement
Proceedings of the 32nd annual international symposium on Computer Architecture
IEEE Transactions on Computers
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
An analytical model for cache replacement policy performance
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Two-level mapping based cache index selection for packet forwarding engines
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Using Indexing Functions to Reduce Conflict Aliasing in Branch Prediction Tables
IEEE Transactions on Computers
Proceedings of the 20th annual international conference on Supercomputing
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
YAARC: yet another approach to further reducing the rate of conflict misses
The Journal of Supercomputing
Design of new XOR-based hash functions for cache memories
Computers & Mathematics with Applications
Counting Dependence Predictors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Notary: Hardware techniques to enhance signatures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Adaptive line placement with the set balancing cache
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Entropy representation of memory access characteristics and cache performance
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
A new TCB cache to efficiently manage TCP sessions for web servers
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Efficient address mapping of shared cache for on-chip many-core architecture
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
STEM: Spatiotemporal Management of Capacity for Intra-core Last Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic co-allocation of level one caches
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
A comparative analysis of performance improvement schemes for cache memories
Computers and Electrical Engineering
ASCIB: adaptive selection of cache indexing bits for removing conflict misses
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Hi-index | 0.01 |
Using alternative cache indexing/hashing functions is a popular technique to reduce conflict misses by achieving a more uniform cache access distribution across the sets in the cache. Although various alternative hashing functions have been demonstrated to eliminate the worst case conflict behavior, no study has really analyzed the pathological behavior of such hashing functions that often result in performance slowdown. In this paper, we present an in-depth analysis of the pathological behavior of cache hashing functions. Based on the analysis, we propose two new hashing functions: prime modulo and prime displacement that are resistant to pathological behavior and yet are able to eliminate the worst case conflict behavior in the L2 cache. We show that these two schemes can be implemented in fast hardware using a set of narrow add operations, with negligible fragmentation in the L2 cache. We evaluate the schemes on 23 memory intensive applications. For applications that have non-uniform cache accesses, both prime modulo and prime displacement hashing achieve an average speedup of 1.27 compared to traditional hashing, without slowing down any of the 23 benchmarks. We also evaluate using multiple prime displacement hashing functions in conjunction with a skewed associative L2 cache. The skewed associative cache achieves a better average speedup at the cost of some pathological behavior that slows down four applications by up to 7%.