Leveraging on-chip networks for data cache migration in chip multiprocessors

Authors:
Noel Eisley;Li-Shiuan Peh;Li Shang
Affiliations:
Princeton University, Princeton, NJ, USA;Princeton University, Princeton, NJ, USA;University of Colorado - Boulder, Boulder, CO, USA
Venue:
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Year:
2008

Citing 17
Cited 10

The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Introducing memory into the switch elements of multiprocessor interconnection networks

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The GLOW cache coherence protocol extensions for widely shared data

ICS '96 Proceedings of the 10th international conference on Supercomputing
The Stanford Hydra CMP

IEEE Micro
Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Hardware-modulated parallelism in chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers

Push-assisted migration of real-time tasks in multi-core processors

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A Novel Cache Organization for Tiled Chip Multiprocessor

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation

Proceedings of the 47th Design Automation Conference
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Predictable task migration for locked caches in multi-core systems

Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Static task partitioning for locked caches in multi-core real-time systems

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Write activity reduction on non-volatile main memories for embedded chip multiprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, chip multiprocessors (CMPs) have arisen as the de facto design for modern high-performance processors, with increasing core counts. An important property of CMPs is that remote, but on-chip, L2 cache accesses are less costly than off-chip accesses; this is in contrast to earlier chip-to-chip or board-to-board multiprocessors, where an access to a remote node is just as costly if not more so than a main memory access. This motivates on-chip cache migration as a means to retain more data on-chip. However, previously proposed techniques do not scale to high core counts: they do not leverage the on-chip caches of all cores nor have a scalable migration mechanism. In this paper we propose ascalable in-network migration technique which uses hints embedded within the router microarchitecture to steer L2 cache evictions towards free/invalid cache slots in any on-chip core cache, rather than evicting it off-chip. We show that our technique can provide an average of a 19% reduction in the number of off-chip memory accesses over the state-of-the-art, beating the performance of a pseudo-optimal migration technique. This can be done with negligible area overhead and a manageable traffic overhead of 13.4%.