Codesign of NoC and Cache Organization for Reducing Access Latency in Chip Multiprocessors

Authors:
Ahmed Abousamra;Alex K. Jones;Rami Melhem
Affiliations:
University of Pittsburgh, Pittsburgh;University of Pittsburgh, Pittsburgh;University of Pittsburgh, Pittsburgh
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2012

Citing 0
Cited 1

Ordering circuit establishment in multiplane NoCs

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reducing data access latency is vital to achieving performance improvements in computing. For chip multiprocessors (CMPs), data access latency depends on the organization of the memory hierarchy, the on-chip interconnect, and the running workload. Several network-on-chip (NoC) designs exploit communication locality to reduce communication latency by configuring special fast paths or circuits on which communication is faster than the rest of the NoC. However, communication patterns are directly affected by the cache organization and many cache organizations are designed in isolation of the underlying NoC or assume a simple NoC design, thus possibly missing optimization opportunities. In this work, we take a codesign approach of the NoC and cache organization. First, we propose a hybrid circuit/packet-switched NoC that exploits communication locality through periodic configuration of the most beneficial circuits. Second, we design a Unique Private (UP) caching scheme targeting the class of interconnects which exploit communication locality to improve communication latency. The Unique Private cache stores the data that are mostly accessed by each processor core in the core's locally accessible cache bank, while leveraging dedicated high-speed circuits in the interconnect to provide remote cores with fast access to shared data. Simulations of a suite of scientific and commercial workloads show that our proposed design achieves a speedup of 15.2 and 14 percent on a 16-core and a 64-core CMP, respectively, over the state-of-the-art NoC-Cache codesigned system that also exploits communication locality in multithreaded applications.