MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
Imagine: Media Processing with Streams
IEEE Micro
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
The future of multiprocessor systems-on-chips
Proceedings of the 41st annual Design Automation Conference
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Cache coherence tradeoffs in shared-memory MPSoCs
ACM Transactions on Embedded Computing Systems (TECS)
The M5 Simulator: Modeling Networked Systems
IEEE Micro
A self-tuning configurable cache
Proceedings of the 44th annual Design Automation Conference
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
Proceedings of the 13th international symposium on Low power electronics and design
Hi-index | 0.00 |
We present an application-driven customization methodology for energy-efficient inter-core communication in embedded multiprocessors. The methodology leverages configurable cache architectures and integrates software and hardware support to achieve energy-efficient data sharing between producer and consumer tasks. The technique is especially beneficial for data-streaming applications exploiting pipeline parallelism where computational phases are mapped to separate processor cores. The application-driven data cache partitioning achieves low-power and low-latency (no coherence misses) inter-core data sharing. The basic premise of the proposed technique is to separate through cache partitioning the private data from the several shared data buffers used by each producer/consumer task. Such partitioning will result in the following benefits: 1) Data cache accesses caused by the processor and the coherence mechanism will need to access only a cache partition instead of the entire cache structure, resulting in significant power reductions; 2) Interference (caused by both processor and coherence activities) across private data and the several shared data buffers is eliminated - this in turn enables the efficient implementation of application-driven remote cache updates at synchronization boundaries.