Low-power snoop architecture for synchronized producer-consumer embedded multiprocessing

Authors:
Chenjie Yu;Peter Petrov
Affiliations:
Department of Electrical and Computer Engineering, University of Maryland, College Park, MD;Department of Electrical and Computer Engineering, University of Maryland, College Park, MD
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2009

Citing 13
Cited 0

MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors

Proceedings of the 2002 international symposium on Low power electronics and design
Low-power design methodology for an on-chip bus with adaptive bandwidth capability

Proceedings of the 40th annual Design Automation Conference
The Coherence Predictor Cache: A Resource-Efficient and Accurate Coherence Prediction Infrastructure

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Proceedings of the 32nd annual international symposium on Computer Architecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a cross-layer customization methodology where application knowledge regarding data sharing in producer-consumer relationships is used in order to aggressively eliminate unnecessary and predictable snoop-induced cache lookups even for references to shared data, thus, achieving significant power reductions with minimal hardware cost. The technique exploits application-specific information regarding the exact producer-consumer relationships between tasks as well as information regarding the precise timing of synchronized accesses to shared memory buffers by their corresponding producers and/or consumers. Snoop-induced cache lookups for accesses to the shared data are eliminated when it is ensured that such lookups will not result in extra knowledge regarding the cache state in respect to the other caches and the memory. Our experiments show average power reductions of more than 80% compared to a general-purpose snoop protocol.