Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs

Authors:
Aleksandar Milenkovic;Veljko M. Milutinovic
Affiliations:
-;-
Venue:
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Year:
2000

Citing 10
Cited 11

Effective cache prefetching on bus-based multiprocessors

ACM Transactions on Computer Systems (TOCS)
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architectural mechanisms for explicit communication in shared memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Data Forwarding in Scalable Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Two techniques for improving performance on bus-based multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
(R) The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding

ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3

An Architecture and Task Scheduling Algorithm for Systems Based on Dynamically Reconfigurable Shared Memory Clusters

IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
A Parallel System Architecture Based on Dynamically Configurable Shared Memory Clusters

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Task Scheduling for Dynamically Configurable Multiple SMP Clusters Based on Extended DSC Approach

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Atomic operations for task scheduling for systems based on communication on-the-fly between SMP clusters

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Dynamic SMP clusters with communication on the fly

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Multi-CMP system with data communication on the fly

The Journal of Supercomputing
Dynamic SMP clusters in soc technology – towards massively parallel fine grain numerics

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Scheduling architecture---supported regions in parallel programs

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Data transfers on the fly for hierarchical systems of chip multi-processors

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Scheduling parallel programs based on architecture: supported regions

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Parallel matrix multiplication based on dynamic SMP clusters in SoC technology

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cache misses and bus traffic are key obstacles to achieving high performance of bus-based shared memory multiprocessors using invalidation-based snooping caches. To overcome these problems, software-controlled techniques for tolerating memory latency can be used, such as cache prefetching and data forwarding. However, some previous studies have shown that cache prefetching is not so effective in bus-based shared memory multiprocessors, while data forwarding is not easy to implement in this environment. In this paper, we propose a novel technique called cache injection, which combines consumer and producer initiated approaches, as well as the broadcasting nature of bus. Performance evaluation based on program-driven simulation and a set of eight parallel benchmark programs shows that cache injection is highly effective in reducing coherence misses and bus traffic.