DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

Authors:
Byn Choi;Rakesh Komuravelli;Hyojin Sung;Robert Smolinski;Nima Honarmand;Sarita V. Adve;Vikram S. Adve;Nicholas P. Carter;Ching-Tsun Chou
Affiliations:
-;-;-;-;-;-;-;-;-
Venue:
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Year:
2011

Citing 0
Cited 6

Why on-chip cache coherence is here to stay

Communications of the ACM
Complexity-effective multicore coherence

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A new perspective for efficient virtual-cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
Protozoa: adaptive granularity cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
Programming a Multicore Architecture without Coherency and Atomic Operations

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.02

Visualization

Abstract

For parallelism to become tractable for mass programmers, shared-memory languages and environments must evolve to enforce disciplined practices that ban "wild shared-memory behaviors;'' e.g., unstructured parallelism, arbitrary data races, and ubiquitous non-determinism. This software evolution is a rare opportunity for hardware designers to rethink hardware from the ground up to exploit opportunities exposed by such disciplined software models. Such a co-designed effort is more likely to achieve many-core scalability than a software-oblivious hardware evolution. This paper presents DeNovo, a hardware architecture motivated by these observations. We show how a disciplined parallel programming model greatly simplifies cache coherence and consistency, while enabling a more efficient communication and cache architecture. The DeNovo coherence protocol is simple because it eliminates transient states--verification using model checking shows 15X fewer reachable states than a state-of-the-art implementation of the conventional MESI protocol. The DeNovo protocol is also more extensible. Adding two sophisticated optimizations, flexible communication granularity and direct cache-to-cache transfers, did not introduce additional protocol states (unlike MESI). Finally, DeNovo shows better cache hit rates and network traffic, translating to better performance and energy. Overall, a disciplined shared-memory programming model allows DeNovo to seamlessly integrate message passing-like interactions within a global address space for improved design complexity, performance, and efficiency.