Latencies of conflicting writes on contemporary multicore architectures

Authors:
Josef Weidendorfer;Michael Ott;Tobias Klug;Carsten Trinitis
Affiliations:
Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/Parallelrechnerarchitektur, Garching bei München;Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/Parallelrechnerarchitektur, Garching bei München;Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/Parallelrechnerarchitektur, Garching bei München;Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/Parallelrechnerarchitektur, Garching bei München
Venue:
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Year:
2007

Citing 7
Cited 5

Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
A class of compatible cache consistency protocols and their support by the IEEE futurebus

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories

25 years of the international symposia on Computer architecture (selected papers)
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
A dynamic cache sub-block design to reduce false sharing

ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
On the effectiveness of sectored caches in reducing false sharing misses

ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
False sharing and its effect on shared memory performance

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4

Assessing cache false sharing effects by dynamic binary instrumentation

Proceedings of the Workshop on Binary Instrumentation and Applications
Dynamic cache contention detection in multi-threaded applications

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Tackling cache-line stealing effects using run-time adaptation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
autopin: automated optimization of thread-to-core pinning on multicore systems

Transactions on high-performance embedded architectures and compilers III
Detection of false sharing using machine learning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper provides a detailed investigation of latency penalties caused by repeated memory writes to nearby memory cells from different threads in parallel programs. When such writes map to the same corresponding cache lines in multiple processors, one can observe the so called false sharing effect. This effect can unnecessarily hamper parallel code due to the line granularity based cache hierarchy, which is common on contemporary processor architectures. In this contribution, a benchmark allowing for quantitative estimates about the consequences of the false sharing effect, is presented. Results show that multicore architectures with shared cache can reduce unwanted effects of false sharing.