Low-power inter-core communication through cache partitioning in embedded multiprocessors

Authors:
Chenjie Yu;Xiangrong Zhou;Peter Petrov
Affiliations:
University of Maryland, College Park;University of Hawaii, Manoa;University of Maryland, College Park
Venue:
Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
Year:
2009

Citing 16
Cited 0

MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors

Proceedings of the 2002 international symposium on Low power electronics and design
Imagine: Media Processing with Streams

IEEE Micro
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
The future of multiprocessor systems-on-chips

Proceedings of the 41st annual Design Automation Conference
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Cache coherence tradeoffs in shared-memory MPSoCs

ACM Transactions on Embedded Computing Systems (TECS)
The M5 Simulator: Modeling Networked Systems

IEEE Micro
A self-tuning configurable cache

Proceedings of the 44th annual Design Automation Conference
Aggressive snoop reduction for synchronized producer-consumer communication in energy-efficient embedded multi-processors

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors

Proceedings of the 13th international symposium on Low power electronics and design

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an application-driven customization methodology for energy-efficient inter-core communication in embedded multiprocessors. The methodology leverages configurable cache architectures and integrates software and hardware support to achieve energy-efficient data sharing between producer and consumer tasks. The technique is especially beneficial for data-streaming applications exploiting pipeline parallelism where computational phases are mapped to separate processor cores. The application-driven data cache partitioning achieves low-power and low-latency (no coherence misses) inter-core data sharing. The basic premise of the proposed technique is to separate through cache partitioning the private data from the several shared data buffers used by each producer/consumer task. Such partitioning will result in the following benefits: 1) Data cache accesses caused by the processor and the coherence mechanism will need to access only a cache partition instead of the entire cache structure, resulting in significant power reductions; 2) Interference (caused by both processor and coherence activities) across private data and the several shared data buffers is eliminated - this in turn enables the efficient implementation of application-driven remote cache updates at synchronization boundaries.