Compiler-Controlled Cache Mapping Rules

Authors:
A R Wagner
Affiliations:
-
Venue:
Compiler-Controlled Cache Mapping Rules
Year:
1995

Citing 0
Cited 2

Caches with Compositional Performance

Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Caches with compositional performance

Embedded processor design challenges

Quantified Score

Hi-index	0.00

Visualization

Abstract

The gap between memory speed and CPU speed in current RISC machines is often bridged with one or more levels of set-associative cache. In programs which operate on dense matrices, performance is often limited by memory reference times, rather than the apparent arithmetic complexity. Attempts to improve performance by blocking the loops of the code may fail to achieve the promised speed, because the submatrices which they try to maintain in cache exhibit self-interference. That is, many distinct cache lines of some or all of the submatrices map into the same cache association set. This can cause essentially every reference to such cache lines to produce a cache miss. A new cache design is presented. This design allows the compiler to program the mappings used to select cache association sets from virtual addresses, allowing a different mapping to be used for each array. It is shown how instructions to program such mappings can be inserted into straightforward loop blocking code to allocate the cache so that distinct arrays referenced during the innermost loop nests occupy different areas of the cache, and that these array blocks are mapped in such a way that no array exhibits self interference. These blocks then remain co-resident in the cache during such loops. Any compiler which automatically "blocks" loops for increased locality can easily generate the proposed instructions.\\ {\bf Index terms and phrases:} Cache, Dense matrix computation, Programmable Cache Mapping, Loop blocking, IBM RS/6000.