Conflict-Free Accesses to Strided Vectors on a Banked Cache

Authors:
Andre Seznec;Roger Espasa
Affiliations:
IEEE;IEEE
Venue:
IEEE Transactions on Computers
Year:
2005

Citing 4
Cited 2

Vector access performance in parallel memories using skewed storage scheme

IEEE Transactions on Computers
Interleaved parallel schemes: improving memory throughput on supercomputers

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Vector multiprocessors with arbitrated memory access

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Tarantula: a vector extension to the alpha architecture

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture

Memory organization with multi-pattern parallel accesses

Proceedings of the conference on Design, automation and test in Europe
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance

Proceedings of the 24th ACM International Conference on Supercomputing

Quantified Score

Hi-index	14.98

Visualization

Abstract

With the advance of integration technology, it has become feasible to implement a microprocessor, a vector unit, and a multimegabyte bank-interleaved L2 cache on a single die. Parallel access to strided vectors on the L2 cache is a major performance issue on such vector microprocessors. A major difficulty for such a parallel access is that one would like to interleave the cache on a block size basis in order to benefit from spatial locality and to maintain a low tag volume, while strided vector accesses naturally work on a word granularity. In this paper, we address this issue. Considering a parallel vector unit with 2^n independent lanes, a 2^n bank interleaved cache, and a cache line size of 2^k words, we show that any slice of 2^{n+k} consecutive elements of any strided vector with stride 2^rR with R odd and r\leq k can be accessed in the L2 cache and routed back to the lanes in 2^k subslices of 2^n elements.