A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Chinese remainder theorem and the prime memory system
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
XOR storage schemes for frequently used data patterns
Journal of Parallel and Distributed Computing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Eliminating cache conflict misses through XOR-based placement functions
ICS '97 Proceedings of the 11th international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dissecting Cyclops: a detailed analysis of a multithreaded architecture
ACM SIGARCH Computer Architecture News
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Eliminating Conflict Misses Using Prime Number-Based Cache Indexing
IEEE Transactions on Computers
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
IEEE Transactions on Computers
Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Application-specific reconfigurable XOR-indexing to eliminate cache conflict misses
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Design of new XOR-based hash functions for cache memories
Computers & Mathematics with Applications
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
A novel migration-based NUCA design for chip multiprocessors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Design of New Hash Mapping Functions
CIT '09 Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02
Constructing optimal XOR-functions to minimize cache conflict misses
ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Hi-index | 0.00 |
Performance of the on-chip cache is critical for processor. The multithread program model usually employed by on-chip many-core architectures may have effects on cache access patterns and eventually on cache conflict miss behaviors. However, the behavior of cache is still unclear, and little has been known of the effectiveness of XOR mapping scheme for many-core systems. In this paper we focus on these problems. We propose an XOR-based address mapping scheme for on-chip many core architecture to increase performance of cache system. Then we evaluate the proposed scheme for various applications, including an application for bioinformatics, matrix multiplication, LU decomposition, FFT from Splash2 benchmarks. Experimental results show that with the proposed scheme, it makes conflict misses of shared cache reduced by about 53% on average, and makes overall performance improved by about 6%. Experimental results also show that the XOR scheme is more cost effectively than victim cache scheme.