Vector access performance in parallel memories using skewed storage scheme
IEEE Transactions on Computers
Organization and performance of a two-level virtual-real cache hierarchy
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
On randomly interleaved memories
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Eliminating cache conflict misses through XOR-based placement functions
ICS '97 Proceedings of the 11th international conference on Supercomputing
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
The design and performance of a conflict-avoiding cache
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Analysis of Cache Performance for Operating Systems and Multiprogramming
Analysis of Cache Performance for Operating Systems and Multiprogramming
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Memory Address Prediction for Data Speculation
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Advanced performance features of the 64-bit PA-8000
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Analytical cache models with applications to cache partitioning
ICS '01 Proceedings of the 15th international conference on Supercomputing
IEEE Transactions on Computers
Improving cache hit ratio by extended referencing cache lines
Journal of Computing Sciences in Colleges
Highly accurate and efficient evaluation of randomising set index functions
Journal of Systems Architecture: the EUROMICRO Journal
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Skewed caches from a low-power perspective
Proceedings of the 2nd conference on Computing frontiers
IEEE Transactions on Computers
Reducing cache misses by application-specific re-configurable indexing
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Application-specific reconfigurable XOR-indexing to eliminate cache conflict misses
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Using Indexing Functions to Reduce Conflict Aliasing in Branch Prediction Tables
IEEE Transactions on Computers
Compiler techniques for reducing data cache miss rate on a multithreaded architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Hardware assistance for trustworthy systems through 3-D integration
Proceedings of the 26th Annual Computer Security Applications Conference
Data layout for cache performance on a multithreaded architecture
Transactions on high-performance embedded architectures and compilers III
Inspection resistant memory: architectural support for security from physical examination
Proceedings of the 39th Annual International Symposium on Computer Architecture
Spatiotemporal Coherence Tracking
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A cache design for probabilistically analysable real-time systems
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.01 |
Applications with regular patterns of memory access can experience high levels of cache conflict misses. In shared-memory multiprocessors conflict misses can be increased significantly by the data transpositions required for parallelization. Techniques such as blocking which are introduced within a single thread to improve locality, can result in yet more conflict misses. The tension between minimizing cache conflicts and the other transformations needed for efficientparallelization leads to complex optimization problems for parallelizing compilers. This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior. We examine the impact of pseudorandom cache indexing on processor cycle times and present practical solutions to some of the major implementation issues for this type of cache. Our conclusions are supported by simulations of a superscalar out-of-order processor executing the SPEC95 benchmarks, as well as from cache simulations of individual loop kernels to illustrate specific effects. We present measurements of Instructions committed Per Cycle (IPC) when comparing the performance of different cache architectures on whole-program benchmarks such as the SPEC95 suite.