Efficient and correct execution of parallel programs that share memory
ACM Transactions on Programming Languages and Systems (TOPLAS)
Designing memory consistency models for shared-memory multiprocessors
Designing memory consistency models for shared-memory multiprocessors
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Thin locks: featherweight synchronization for Java
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Retrospective: memory consistency and event ordering in scalable shared-memory multiprocessors
25 years of the international symposia on Computer architecture (selected papers)
Basic compiler algorithms for parallel programs
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Scalable queue-based spin locks with timeout
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Exploiting deferred destruction: an analysis of read-copy-update techniques in operating system kernels
Threads cannot be implemented as a library
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Foundations of the C++ concurrency memory model
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Memory models: a case for rethinking parallel languages and hardware
Communications of the ACM
Transactional memory should be an implementation technique, not a programming interface
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Performance implications of fence-based memory models
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Stability in weak memory models
CAV'11 Proceedings of the 23rd international conference on Computer aided verification
Can seqlocks get along with programming language memory models?
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Hi-index | 0.02 |
C or C++ programs relying on the pthreads interface for concurrency are required to use a specified set of functions to avoid data races, and to ensure memory visibility across threads. Although the detailed rules are not completely, it is not hard to refine them to a simple set of clear and uncontroversial rules for at least a subset of the C language that excludes structures (and hence bit-fields). We precisely address the question of how locks in this subset must be implemented, and particularly when other memory operations can be reordered with respect to locks. This impacts the memory fences required in lock implementations, and hence has significant performance impact. Along the way, we show that a significant class of common compiler transformations are actually safe in the presence of pthreads, something which appears to have received minimal attention in the past. We show that, surprisingly to us, the reordering constraints are not symmetric for the lock and unlock operations. In particular, it is not always safe to move memory operations into a locked region by delaying them past a pthread_mutex_lock() call, but it is safe to move them into such a region by advancing them to before a pthread_mutex_unlock() call. We believe that this was not previously recognized, and there is evidence that it is under-appreciated among implementors of thread libraries. Although our precise arguments are expressed in terms of statement reordering within a small subset language, we believe that our results capture the situation for a full C/C++ implementation. We also argue that our results are insensitive to the details of our literal (and reasonable, though possibly unintended) interpretation of the pthread standard. We believe that they accurately reflect hardware memory ordering constraints in addition to compiler constraints. And they appear to have implications beyond pthread environments.