Vector access performance in parallel memories using skewed storage scheme
IEEE Transactions on Computers
On randomly interleaved memories
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A novel cache design for vector processing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Eliminating cache conflict misses through XOR-based placement functions
ICS '97 Proceedings of the 11th international conference on Supercomputing
Multiplicative, congruential random-number generators with multiplier ± 2k1 ± 2k2 and modulus 2p - 1
ACM Transactions on Mathematical Software (TOMS)
The design and performance of a conflict-avoiding cache
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Coding the Lehmer pseudo-random number generator
Communications of the ACM
Skewed Associativity Improves Program Performance and Enhances Predictability
IEEE Transactions on Computers
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
YAARC: yet another approach to further reducing the rate of conflict misses
The Journal of Supercomputing
Notary: Hardware techniques to enhance signatures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A new IP lookup cache for high performance IP routers
Proceedings of the 47th Design Automation Conference
Efficient address mapping of shared cache for on-chip many-core architecture
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Hi-index | 14.98 |
Using alternative cache indexing/hashing functions is a popular technique to reduce conflict misses by achieving a more uniform cache access distribution across the sets in the cache. Although various alternative hashing functions have been demonstrated to eliminate the worst-case conflict behavior, no study has really analyzed the pathological behavior of such hashing functions that often results in performance slowdown. In this paper, we present an in-depth analysis of the pathological behavior of cache hashing functions. Based on the analysis, we propose two new hashing functions, prime modulo and odd-multiplier displacement, that are resistant to pathological behavior and yet are able to eliminate the worst-case conflict behavior in the L2 cache. We show that these two schemes can be implemented in fast hardware using a set of narrow addition operations, with negligible fragmentation in the L2 cache. We evaluate the schemes on 23 memory intensive applications. For applications that have nonuniform cache accesses, both prime modulo and odd-multiplier displacement hashing achieve an average speedup of 1.27 compared to traditional hashing, without slowing down any of the 23 benchmarks. We also evaluate using odd-multiplier displacement function with multiple multipliers in conjunction with a skewed associative L2 cache. The skewed associative cache achieves a better average speedup at the cost of some pathological behavior that slows down four applications by up to 7 percent.