Generating representative Web workloads for network and server performance evaluation
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
A generic architecture for on-chip packet-switched interconnections
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
DBmbench: fast and accurate database workload representation on modern microarchitecture
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
A Statistical Traffic Model for On-Chip Interconnection Networks
MASCOTS '06 Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Flattened Butterfly Topology for On-Chip Networks
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
An optimized multicore cache coherence design for exploiting communication locality
Proceedings of the great lakes symposium on VLSI
Dual partitioning multicasting for high-performance on-chip networks
Journal of Parallel and Distributed Computing
Hi-index | 0.03 |
Most CMPs use on-chip networks to connect cores and tend to integrate more simple cores on a single die. Low-radix networks, such as 2D-MESH, are widely used in tiled CMPs since they can be mapped to on-chip networks efficiently. However, low-radix networks introduce high network latency caused by long diameter. In this paper, we propose the use of group-caching design in NoC based multicore cache coherent systems. In our design, on-chip L2 banks are organized to form multiple groups. Each cache group behaves like a shared L2 cache for the cores inside cache group while the cache coherence between cache groups is maintained by coherence messages. Besides, group-caching also adopts the new cache replacement policy to improve the inefficient use of the aggregate L2 cache capacity. Compared to banked and shared L2 design, as most L2 accesses are served by local cache group, the hop count is significantly reduced. Experiment results based on full-system simulation show that for 2D-MESH, group-caching can increase the performance by 2%~8% compared to banked and shared L2 design, with network energy consumption reduced by 11%~13%. Experiment results also show that the communication overhead inside cache group plays an important role in the performance of group-caching.