On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
Vector Computer Memory Bank Contention
IEEE Transactions on Computers
Cache performance of vector processors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
Strategies for cache and local memory management by global program transformation
Proceedings of the 1st International Conference on Supercomputing
Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor
IEEE Transactions on Computers
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
High-performance computer architecture (2nd ed.)
High-performance computer architecture (2nd ed.)
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
On randomly interleaved memories
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Introducing a New Cache Design into Vector Computers
IEEE Transactions on Computers
The effectiveness of caches for vector processors
ICS '94 Proceedings of the 8th international conference on Supercomputing
A Memory Interference Model for Regularly Patterned Multiple Stream Vector Accesses
IEEE Transactions on Parallel and Distributed Systems
CAT—caching address tags: a technique for reducing area cost of on-chip caches
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
A Comparative Analysis of Cache Designs for Vector Processing
IEEE Transactions on Computers
Computer
Eliminating Conflict Misses Using Prime Number-Based Cache Indexing
IEEE Transactions on Computers
A case for a working-set-based memory hierarchy
Proceedings of the 2nd conference on Computing frontiers
A One's Complement Cache Memory
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
This paper introduces an innovative cache design for vector computers, called prime-mapped cache. By utilizing the special properties of a Mersenne prime, the new design does not increase the critical path length of a processor, nor does it increase the cache access time as compared to a direct-mapped cache. The prime-mapped cache minimizes cache miss ratio caused by line interferences that have been shown to be critical for numerical applications by previous investigators. We show that significant performance gains are possible by adding the proposed cache memory into an existing vector computer provided that application programs can be blocked. The performance gain will increase with the increase of the speed gap between processors and memories. We develop an analytical performance model based on a generic vector computation model to study the performance of the design. Our preliminary performance analysis on various vector access patterns shows that the prime-mapped cache can provide as much as a factor of 2 to 3 performance improvement over the conventional direct-mapped cache in the vector processing environment. Moreover, the additional hardware cost introduced by the new mapping scheme is negligible.