Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories
25 years of the international symposia on Computer architecture (selected papers)
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
A dynamic cache sub-block design to reduce false sharing
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
On the effectiveness of sectored caches in reducing false sharing misses
ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
False sharing and its effect on shared memory performance
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Assessing cache false sharing effects by dynamic binary instrumentation
Proceedings of the Workshop on Binary Instrumentation and Applications
Dynamic cache contention detection in multi-threaded applications
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Tackling cache-line stealing effects using run-time adaptation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
autopin: automated optimization of thread-to-core pinning on multicore systems
Transactions on high-performance embedded architectures and compilers III
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
This paper provides a detailed investigation of latency penalties caused by repeated memory writes to nearby memory cells from different threads in parallel programs. When such writes map to the same corresponding cache lines in multiple processors, one can observe the so called false sharing effect. This effect can unnecessarily hamper parallel code due to the line granularity based cache hierarchy, which is common on contemporary processor architectures. In this contribution, a benchmark allowing for quantitative estimates about the consequences of the false sharing effect, is presented. Results show that multicore architectures with shared cache can reduce unwanted effects of false sharing.