Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Filtering directory lookups in CMPs with write-through caches
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Dynamic, multi-core cache coherence architecture for power-sensitive mobile processors
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Filtering directory lookups in CMPs
Microprocessors & Microsystems
Hi-index | 0.00 |
Maintaining local caches coherent in bus-based multiprocessor systems results in significantly elevated power consumption, as the bus snooping protocols require local cache lookups for each memory reference placed on the common bus. Such a conservative approach is warranted in general-purpose systems, where no prior knowledge regarding the communication structure between threads or processes is available. In such a general-purpose context the assumption is that each memory request is potentially a reference to a shared memory region, which may result in cache inconsistency, if no correcting activities are undertaken. The approach we propose exploits the fact that in embedded systems, important knowledge is available to the system designers regarding communication activities between tasks allocated to the different processor nodes. We demonstrate how the snoop-related cache probing activity can be drastically reduced by identifying in a deterministic way all the shared memory regions and the communication patterns between the processor nodes. Cache snoop activity is enabled only for the fraction of the bus transactions, which refer to locations belonging to known shared memory regions for each processor node; for the remaining larger part of memory references known to be of no relation to the given processor node, snoop probings in the local cache are completely disabled, thus saving a large amount of power. The experiments which we have performed on a number of important applications demonstrate the effectiveness of the proposed approach.