On randomly interleaved memories
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Capturing dynamic memory reference behavior with adaptive cache topology
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Reconsidering custom memory allocation
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Improved indexing for cache miss reduction in embedded systems
Proceedings of the 40th annual Design Automation Conference
IEEE Transactions on Computers
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Reducing cache misses by application-specific re-configurable indexing
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
Smaller Split L-1 Data Caches for Multi-core Processing Systems
ISPAN '09 Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks
Custom memory allocation for free
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Dynamic threshold for imbalance assessment on load balancing for multicore systems
Computers and Electrical Engineering
Hi-index | 0.00 |
There have been numerous techniques proposed in the literature that aim to improve the performance of cache memories by reducing cache conflicts. These techniques were proposed over the past decade and each proposal independently claimed to reduce conflict misses. However, because the published results used different benchmarks and different experimental setups, it is not easy to compare them. In this paper we report a side-by-side comparison of these techniques. We also evaluate the suitability of some of these techniques for caches with higher set associativities. In addition to evaluating techniques for their impact on cache misses and average memory access times, we also evaluate the techniques for their ability in reducing the non-uniformity of cache accesses. The conclusion of our work is that, each application may benefit from a different technique and no single scheme works universally well for all applications. We also observe that, for the majority of applications, XORing (XOR) and Odd-multiplier indexing schemes perform reasonably well. Among programmable associativity techniques, B-cache performs better than column-associative and adaptive-caches, but column-associative caches require very minimal extensions to hardware. Uniformity of cache accesses is improved most by B-cache technique while column-associative cache also improves cache access uniformities. Based on the observation that different techniques benefit different applications, we explored the use of multiple, programmable addressing mechanisms, each addressing scheme designed for a specific application. We include some preliminary data using multiple addressing schemes.