Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
SVMview: A Performance Tuning Tool for DSM-Based Parallel Computers
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Dynamically Controlling False Sharing in Distributed Shared Memory
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
False sharing and its effect on shared memory performance
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
CacheIn: a toolset for comprehensive cache inspection
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Latencies of conflicting writes on contemporary multicore architectures
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Dynamic cache contention detection in multi-threaded applications
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
SHERIFF: precise detection and automatic mitigation of false sharing
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Proceedings of the 8th ACM European Conference on Computer Systems
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
PREDATOR: predictive false sharing detection
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Unnecessary sharing of cache lines among threads of a program due to private data which is located in close proximity in memory is a performance obstacle. Depending on access frequency to this data and scheduling of threads to processor cores, this can lead to substantial overhead because of latency induced by cache lines exchanges, known as false sharing. Since processor hardware can not distinguish these effects from real data exchange (true sharing), all measurement tools have to rely on heuristics for detection. In this paper, we describe an approach using dynamic binary instrumentation to derive an estimate for the number of unnecessary exchanges of cache lines caused by false sharing and a tool assisting the programmer to identify the data structures involved as well as the code sections triggering false sharing. To evaluate the impact of false sharing the estimated number of occurrences is translated into a temporal overhead. Results of our tool are presented for two small example codes and a real-world application.