Factors Affecting False Sharing on Page-Granularity Cache-Coherent Shared-Memory Multiprocessors

Authors:
Vik Khera
Affiliations:
-
Venue:
Factors Affecting False Sharing on Page-Granularity Cache-Coherent Shared-Memory Multiprocessors
Year:
1994

Citing 0
Cited 1

False sharing problems in cluster-based disk arrays

Proceedings of the 1999 ACM symposium on Applied computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficiently supporting a shared memory paradigm in a large-scale multiprocessor generally involves some form of data caching. One of the drawbacks of caching shared data is the cost of keeping the multiple copies coherent. One source of unnecessary coherency overhead is caused by a problem known as {\em false sharing\/}. Unfortunately, the lack of a precise, universally accepted, definition of false sharing hinders research to detect and eliminate the problem. We articulate our intuitive notion of false sharing and address the problems encountered in previous attempts at defining false sharing. We motivate the importance of a concrete measure by demonstrating that false sharing related coherence overhead comprises a significant portion of the coherence costs in real applications, especially when page-granularity coherence is required. An architecture-independent measure of the false sharing exhibited in a reference trace for cache lines of a specified size is proposed and evaluated experimentally. The proposed measure attempts to summarize the false sharing impact by approximating some factors and discarding others. The evaluation of this formulation reveals that such summary statistics lose too much information to be of practical use in predicting performance. We use this work to motivate experiments to determine the relative importance of the various workload and architectural factors that affect coherence data traffic. The conclusion from these experiments is that the precise memory reference interleaving order is the most significant factor affecting false sharing coherence data traffic. Our methodology is to use an execution-driven simulation of specific architectures and applications to generate memory reference traces. The traces are then analyzed off-line.