On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
A Case for Direct-Mapped Caches
Computer
MIPS RISC architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance optimization of pipelined primary cache
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A novel cache design for vector processing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Performance of cache memories for vector computers
Journal of Parallel and Distributed Computing - Special issue on performance of supercomputers
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A Quantitative Evaluation of Cache Types for High-Performance Computer Systems
IEEE Transactions on Computers
The Organization and Use of Parallel Memories
IEEE Transactions on Computers
Hi-index | 0.00 |
Most of today's microprocessors have an on-chip cache to reduce average memory access latency. These on-chip caches generally have low associativity and small sizes. Cache line conflicts are the main source of cache misses which are essential to overall system performance. This paper introduces an innovative, conflict-free cache design, called one's complement cache. By means of parallel computation of cache addresses and memory addresses of data, the new design does not increase critical hit time of cache accesses. Cache misses caused by line interferences are minimized by means of evenly distributing data items referenced by program loops across all sets in a cache. Evenly distribution of data in the cache is achieved by making the number of sets in the cache a prime or an odd number thereby the chance of related data being mapped to a same set is small. Trace-driven simulations are used to evaluate the performance of the new design. Performance results on a set of programs from SPEC92 benchmarks show that the new design improves cache performance over the conventional set-associative cache by about 100% with negligibly additional hardware cost.