A novel cache design for vector processing

Authors:
Qing Yang;Liping Wu Yang
Affiliations:
-;-
Venue:
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Year:
1992

Citing 13
Cited 11

On the effective bandwidth of interleaved memories in vector processor systems

IEEE Transactions on Computers
Vector Computer Memory Bank Contention

IEEE Transactions on Computers
Cache performance of vector processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches

Computer
Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor

IEEE Transactions on Computers
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
High-performance computer architecture (2nd ed.)

High-performance computer architecture (2nd ed.)
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
On randomly interleaved memories

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems

Introducing a New Cache Design into Vector Computers

IEEE Transactions on Computers
The effectiveness of caches for vector processors

ICS '94 Proceedings of the 8th international conference on Supercomputing
A Memory Interference Model for Regularly Patterned Multiple Stream Vector Accesses

IEEE Transactions on Parallel and Distributed Systems
CAT—caching address tags: a technique for reducing area cost of on-chip caches

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A Comparative Analysis of Cache Designs for Vector Processing

IEEE Transactions on Computers
A Cache Visualization Tool

Computer
Eliminating Conflict Misses Using Prime Number-Based Cache Indexing

IEEE Transactions on Computers
A case for a working-set-based memory hierarchy

Proceedings of the 2nd conference on Computing frontiers
A One's Complement Cache Memory

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By utilizing the special properties of a Mersenne prime, the new design does not increase the critical path length of a processor, nor does it increase the cache access time as compared to a direct-mapped cache. The prime-mapped cache minimizes cache miss ratio caused by line interferences that have been shown to be critical for numerical applications by previous investigators. We show that significant performance gains are possible by adding the proposed cache memory into an existing vector computer provided that application programs can be blocked. The performance gain will increase with the increase of the speed gap between processors and memories. We develop an analytical performance model based on a generic vector computation model to study the performance of the design. Our preliminary performance analysis on various vector access patterns shows that the prime-mapped cache can provide as much as a factor of 2 to 3 performance improvement over the conventional direct-mapped cache in the vector processing environment. Moreover, the additional hardware cost introduced by the new mapping scheme is negligible.