An open enviornment for building parallel programming systems
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
NUMA policies and their relation to memory architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Distributed Shared Memory: A Survey of Issues and Algorithms
Computer - Distributed computing systems: separate resources acting as one
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
ACM SIGOPS Operating Systems Review
Software coherence in multiprocessor memory systems
Software coherence in multiprocessor memory systems
A Trace-Based Comparison of Shared Memory Multiprocessor Architectures
A Trace-Based Comparison of Shared Memory Multiprocessor Architectures
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Assessing cache false sharing effects by dynamic binary instrumentation
Proceedings of the Workshop on Binary Instrumentation and Applications
Dynamic cache contention detection in multi-threaded applications
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Tackling cache-line stealing effects using run-time adaptation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
SHERIFF: precise detection and automatic mitigation of false sharing
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Toward predictable performance in software packet-processing platforms
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Latencies of conflicting writes on contemporary multicore architectures
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Proceedings of the 8th ACM European Conference on Computer Systems
Parallelizing Sequential Programs with Statistical Accuracy Tests
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
PREDATOR: predictive false sharing detection
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
HEAP: A Highly Efficient Adaptive multi-Processor framework
Microprocessors & Microsystems
Hi-index | 0.00 |
False sharing occurs when processors in a shared-memory parallel system make references to different data objects within the same coherence block (cache line or page), thereby inducing "unnecessary" coherence operations. False sharing is widely believed to be a serious problem for parallel program performance, but a precise definition and quantification of the problem has proven to be elusive. We explain why. In the process, we present a variety of possible definitions for false sharing, and discuss the merits and drawbacks of each. Our discussion is based on experience gained during a four-year study of multiprocessor memory architecture and its effect on the behavior of applications in a sixteen-program suite. Using trace-based simulation, we present experimental evidence to support the claim that false sharing is a serious problem. Unfortunately, we find that the various computationally tractable approaches to quantifying the problem are either heuristic in nature, or fail to agree with intuition.