An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Design space exploration for multicore architectures: a power/performance/thermal view
Proceedings of the 20th annual international conference on Supercomputing
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
In recent years, with the possible end of further improvements in single processor, more and more researchers shift to the idea of Chip Multiprocessors (CMPs). The burgeoning of multi-thread programs brings on dramatically increased inter-core communication. Unfortunately, traditional architectures fail to meet the challenge, as they conduct such a kind of communication on the last level of on-chip cache or even on the memory.This paper proposes a novel approach, called Collective Cache, to differentiate the access to shared/private data and handle data communication on the first level cache. In the proposed cache architecture, the share data found in the last level cache are moved into the Collective Cache, a L1 cache structure shared by all cores. We show that the mechanism this paper proposed can immensely enhance inter-processors communication, increase the usage efficiency of L1 cache and simplify data consistency protocol. Extensive analysis of this approach with Simics shows that it can reduce the L1 cache miss rate by 3.36%.