SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Sharing memory robustly in message-passing systems
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Synchronization without contention
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Optimal strategies for spinning and blocking
Journal of Parallel and Distributed Computing
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Scalable queue-based spin locks with timeout
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Design Challenges of Technology Scaling
IEEE Micro
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Amdahl's Law in the Multicore Era
Computer
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Performance Studies of Commercial Workloads on a Multi-core System
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Factored operating systems (fos): the case for a scalable operating system for multicores
ACM SIGOPS Operating Systems Review
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Flat combining and the synchronization-parallelism tradeoff
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Communications of the ACM
GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Memory Performance And SPEC OpenMP scalability on quad-socket x86 64 systems
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Lock cohorting: a general technique for designing NUMA locks
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
TM2C: a software transactional memory for many-cores
Proceedings of the 7th ACM european conference on Computer Systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
The Art of Multiprocessor Programming, Revised Reprint
The Art of Multiprocessor Programming, Revised Reprint
Hi-index | 0.00 |
This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket -- uniform and non-uniform -- to multi-socket -- directory and broadcast-based -- many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.