A Comparative Analysis of Cache Designs for Vector Processing

Authors:
Tong Sun;Qing Yang
Affiliations:
Univ. of Rhode Island, Kingston;Univ. of Rhode Island, Kingston
Venue:
IEEE Transactions on Computers
Year:
1999

Citing 12
Cited 2

Vector Computer Memory Bank Contention

IEEE Transactions on Computers
Cache performance of vector processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches

Computer
Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Measurement of memory access contentions in multiple vector processor systems

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A novel cache design for vector processing

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Introducing a New Cache Design into Vector Computers

IEEE Transactions on Computers
VAX vector architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Performance of the SPEC92 Benchmark Suite

IEEE Micro
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems

Design and analysis of static memory management policies for CC-NUMA Multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Sams: single-affiliation multiple-stride parallel memory scheme

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?

Quantified Score

Hi-index	14.98

Visualization

Abstract

This paper presents an experimental study on cache memory designs for vector computers. We use an execution-driven simulator to evaluate vector cache performance of a set of application programs from Perfect Club and SPEC92 benchmark suites. Our simulation results uncover a few important facts which were unknown before: First of all, the prime-mapped cache that we newly proposed shows great performance potential in vector processing environment. Because of its conflict-free property, the prime-mapped cache performs significantly better than conventional cache designs for all applications considered. Second, performance results on the benchmarks indicate that data locality in vector processing does exist, although the effects of line size, associativity, replacement algorithm, and prefetching scheme on cache performance are very different from what has been commonly believed. A medium size vector cache (e.g., 128Kbytes) eliminates the necessity for a large number of interleaved memory banks in vector computers. Our experiments show that the vector computer that has a medium size prime-mapped cache with small cache line size and limited amount of prefetching provides significant speedup over conventional vector computers without cache. Performance results reported in this paper can also provide guidance to general-purpose computer designers to enhance cache performance for numerical applications.