False sharing and its effect on shared memory performance

Authors:
William J. Bolosky;Michael L. Scott
Affiliations:
Microsoft Research Laboratory, Redmond, WA;Computer Science Department, University of Rochester, Rochester, NY
Venue:
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Year:
1993

Citing 13
Cited 14

An open enviornment for building parallel programming systems

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Simple but effective techniques for NUMA memory management

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
NUMA policies and their relation to memory architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Distributed Shared Memory: A Survey of Issues and Algorithms

Computer - Distributed computing systems: separate resources acting as one
Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
The DASH prototype: implementation and performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adjustable block size coherent caches

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Memory consistency models

ACM SIGOPS Operating Systems Review
Software coherence in multiprocessor memory systems

Software coherence in multiprocessor memory systems
A Trace-Based Comparison of Shared Memory Multiprocessor Architectures

A Trace-Based Comparison of Shared Memory Multiprocessor Architectures

Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Degenerate Sharing

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Assessing cache false sharing effects by dynamic binary instrumentation

Proceedings of the Workshop on Binary Instrumentation and Applications
Dynamic cache contention detection in multi-threaded applications

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Tackling cache-line stealing effects using run-time adaptation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
SHERIFF: precise detection and automatic mitigation of false sharing

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Toward predictable performance in software packet-processing platforms

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Latencies of conflicting writes on contemporary multicore architectures

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Whose cache line is it anyway?: operating system support for live detection and repair of false sharing

Proceedings of the 8th ACM European Conference on Computer Systems
Parallelizing Sequential Programs with Statistical Accuracy Tests

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Detection of false sharing using machine learning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
PREDATOR: predictive false sharing detection

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
HEAP: A Highly Efficient Adaptive multi-Processor framework

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

False sharing occurs when processors in a shared-memory parallel system make references to different data objects within the same coherence block (cache line or page), thereby inducing "unnecessary" coherence operations. False sharing is widely believed to be a serious problem for parallel program performance, but a precise definition and quantification of the problem has proven to be elusive. We explain why. In the process, we present a variety of possible definitions for false sharing, and discuss the merits and drawbacks of each. Our discussion is based on experience gained during a four-year study of multiprocessor memory architecture and its effect on the behavior of applications in a sixteen-program suite. Using trace-based simulation, we present experimental evidence to support the claim that false sharing is a serious problem. Unfortunately, we find that the various computationally tractable approaches to quantifying the problem are either heuristic in nature, or fail to agree with intuition.