Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Compiler code transformations for superscalar-based high performance systems
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Undecidability of static analysis
ACM Letters on Programming Languages and Systems (LOPLAS)
The undecidability of aliasing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Lazy release consistency for hardware-coherent multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Pointer analysis for multithreaded programs
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Unification-based pointer analysis with directional assignments
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Pointer and escape analysis for multithreaded programs
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Pointer analysis: haven't we solved this problem yet?
PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
Imagine: Media Processing with Streams
IEEE Micro
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
The future of multiprocessor systems-on-chips
Proceedings of the 41st annual Design Automation Conference
Proceedings of the 18th annual international conference on Supercomputing
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Overview of the MPSoC design challenge
Proceedings of the 43rd annual Design Automation Conference
Cache coherence tradeoffs in shared-memory MPSoCs
ACM Transactions on Embedded Computing Systems (TECS)
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Proximity-aware directory-based coherence for multi-core processor architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Software behavior oriented parallelization
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A self-tuning configurable cache
Proceedings of the 44th annual Design Automation Conference
A framework for rapid system-level exploration, synthesis, and programming of multimedia MP-SoCs
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
Proceedings of the 13th international symposium on Low power electronics and design
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Hi-index | 0.00 |
We present a framework for performance-, bandwidth-, and energy-efficient intercore communication in embedded MultiProcessor Systems-on-a-Chip (MPSoC). The methodology seamlessly integrates compiler, operating system, and hardware support to achieve a low-cost communication between synchronized producers and consumers. The technique is especially beneficial for data-streaming applications exploiting pipeline parallelism with computational phases mapped to separate cores. Code transformations utilizing a simple ISA support ensure that producer writes are propagated to consumers with a single interconnect transaction per cache block just prior to the producer exiting its synchronization region. Furthermore, in order to completely eliminate misses to shared data caused by interference with private data and also to minimize the cache energy, we integrate to the proposed framework a cache way partitioning policy based on a simple cache configurability support, which isolates the shared buffers from other cache traffic. This mechanism results in significant power savings since only a subset of the cache ways needs to be looked up for each cache access. The end result of the proposed framework is a single communication transaction per shared cache block between a producer and a consumer with no coherence misses on the consumer caches. Our experiments demonstrate significant reductions in interconnect traffic, cache misses, and energy for a set of multiprocessor benchmarks.