Utilizing shared data in chip multiprocessors with the Nahalal architecture

Authors:
Zvika Guz;Idit Keidar;Avinoam Kolodny;Uri C. Weiser
Affiliations:
Technion - Israel Institute of Technology, Haifa, Israel;Technion - Israel Institute of Technology, Haifa, Israel;Technion - Israel Institute of Technology, Haifa, Israel;Technion - Israel Institute of Technology, Haifa, Israel
Venue:
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Year:
2008

Citing 21
Cited 8

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Single-Chip Multiprocessor

Computer
Simics: A Full System Simulation Platform

Computer
VLSI Architecture: Past, Present, and Future

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

IEEE Computer Architecture Letters
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proximity-aware directory-based coherence for multi-core processor architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
Nahalal: Cache Organization for Chip Multiprocessors

IEEE Computer Architecture Letters
Enhancing L2 organization for CMPs with a center cell

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
A Novel Cache Organization for Tiled Chip Multiprocessor

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Dynamic thread and data mapping for NoC based CMPs

Proceedings of the 46th Annual Design Automation Conference
NoC-aware cache design for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Research note: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors

Journal of Parallel and Distributed Computing
Exploiting replication to improve performances of NUCA-based CMP systems

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses a new cache organization in a Chip Multiprocessors (CMP) environment. We introduce Nahalal, an architecture whose novel floorplan topology partitions cached data according to its usage (shared versus private data), and thus enables fast access to shared data for all processors while preserving the vicinity of private data to each processor. The Nahalal architecture combines the best of both shared caches and private caches, enabling fast accesses to data as in private caches while eliminating the need for inter-cache coherence transactions. Detailed simulations in Simics demonstrate that Nahalal decreases cache access latency by up to 41.1% compared to traditional CMP designs, yielding performance gains of up to 12.65% in run time.