The mutual exclusion problem: part I—a theory of interprocess communication
Journal of the ACM (JACM)
Algorithms for mutual exclusion
Algorithms for mutual exclusion
A fast mutual exclusion algorithm
ACM Transactions on Computer Systems (TOCS)
Efficient synchronization of multiprocessors with shared memory
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
The information structure of distributed mutual exclusion algorithms
ACM Transactions on Computer Systems (TOCS)
International Journal of Parallel Programming
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Two algorithms for barrier synchronization
International Journal of Parallel Programming
A tree-based algorithm for distributed mutual exclusion
ACM Transactions on Computer Systems (TOCS)
Characterizing the synchronization behavior of parallel programs
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Adaptive backoff synchronization techniques
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Computers
Multi-model parallel programming in psyche
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
A methodology for implementing highly concurrent data structures
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Highly parallel computing
A N algorithm for mutual exclusion in decentralized systems
ACM Transactions on Computer Systems (TOCS)
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Synchronization with multiprocessor caches
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Synchronization in Distributed Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
A New Solution to Lamport's Concurrent Programming Problem Using Small Shared Variables
ACM Transactions on Programming Languages and Systems (TOPLAS)
An optimal algorithm for mutual exclusion in computer networks
Communications of the ACM
Medusa: an experiment in distributed operating system structure
Communications of the ACM
Synchronization with eventcounts and sequencers
Communications of the ACM
A new solution of Dijkstra's concurrent programming problem
Communications of the ACM
Solution of a problem in concurrent programming control
Communications of the ACM
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An economical solution to the cache coherence problem
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Algorithms for Scalable Synchronization on Shared-Memory Multiproceessors
Algorithms for Scalable Synchronization on Shared-Memory Multiproceessors
Efficient Doacross execution on distributed shared-memory multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Synchronization without contention
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scalable reader-writer synchronization for shared-memory multiprocessors
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Comparison of hardware and software cache coherence schemes
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Performance issues in non-blocking synchronization on shared-memory multiprocessors
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Subset barrier synchronization on a private-memory parallel system
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
The RP3 program visualization environment
IBM Journal of Research and Development
The design and implementation of HoME
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Scheduling in parallel systems with a hierarchical organization of tasks
ICS '92 Proceedings of the 6th international conference on Supercomputing
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An effective synchronization network for hot-spot accesses
ACM Transactions on Computer Systems (TOCS)
Benchmark workload generation and performance characterization of multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Willow: a scalable shared memory multiprocessor
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
ACM Transactions on Computer Systems (TOCS)
Waiting algorithms for synchronization in large-scale multiprocessors
ACM Transactions on Computer Systems (TOCS)
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cooperative shared memory: software and hardware for scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Fast, scalable synchronization with minimal hardware support
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An approach to scalability study of shared memory parallel systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Spin-Lock Synchronization on the Butterfly and KSR1
IEEE Parallel & Distributed Technology: Systems & Technology
A comparison of message passing and shared memory architectures for data parallel programs
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Time bounds for mutual exclusion and related problems
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A performance evaluation of lock-free synchronization protocols
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Using k-exclusion to implement resilient, scalable shared objects (extended abstract)
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters
IEEE Transactions on Parallel and Distributed Systems
High performance synchronization algorithms for multiprogrammed multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ACM Transactions on Computer Systems (TOCS)
The communication requirements of mutual exclusion
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Efficient techniques for fast nested barrier synchronization
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Techniques for reducing overheads of shared-memory multiprocessing
ICS '95 Proceedings of the 9th international conference on Supercomputing
Lock-free linked lists using compare-and-swap
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Distributed Shared Abstractions (DSA) on Multiprocessors
IEEE Transactions on Software Engineering
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Selecting locking primitives for parallel programming
Communications of the ACM
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
An efficient recovery-based spin lock protocol for preemptive shared-memory multiprocessors
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Distributed shared memory systems with improved barrier synchronization and data transfer
ICS '97 Proceedings of the 11th international conference on Supercomputing
The interaction of parallel programming constructs and coherence protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
A Prioritized Multiprocessor Spin Lock
IEEE Transactions on Parallel and Distributed Systems
Characterizing Distributed Shared Memory Performance: A Case Study of the Convex SPP1000
IEEE Transactions on Parallel and Distributed Systems
Combining funnels: a new twist on an old tale…
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Thin locks: featherweight synchronization for Java
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A time complexity lower bound for randomized implementations of some shared objects
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
A methodology and an evaluation of the SGI Origin2000
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks
IEEE Transactions on Parallel and Distributed Systems
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs
International Journal of Parallel Programming
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements
IEEE Transactions on Parallel and Distributed Systems
A simple local-spin group mutual exclusion algorithm
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Scalable concurrent priority queue algorithms
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Evaluating synchronization on shared address space multiprocessors: methodology and performance
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
EMERALDS: a small-memory real-time microkernel
Proceedings of the seventeenth ACM symposium on Operating systems principles
An efficient meta-lock for implementing ubiquitous synchronization
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A study of locking objects with bimodal fields
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Performance of Hierarchical Processor Scheduling in Shared-Memory Multiprocessor Systems
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
Priority Queues and Sorting Methods for Parallel Simulation
IEEE Transactions on Software Engineering
System-on-a-chip processor synchronization support in hardware
Proceedings of the conference on Design, automation and test in Europe
Lock-free scheduling of logical processes in parallel simulation
Proceedings of the fifteenth workshop on Parallel and distributed simulation
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Scalable queue-based spin locks with timeout
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Barrier Synchronization on Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
An improved lower bound for the time complexity of mutual exclusion
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
A scalable and flexible data synchronization scheme for embedded HW-SW shared-memory systems
Proceedings of the 14th international symposium on Systems synthesis
A system-on-a-chip lock cache with task preemption support
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
EMERALDS: A Small-Memory Real-Time Microkernel
IEEE Transactions on Software Engineering
Impact of Workload and System Parameters on Next Generation Cluster Scheduling Mechanisms
IEEE Transactions on Parallel and Distributed Systems
A Simple Local-Spin Group Mutual Exclusion Algorithm
IEEE Transactions on Parallel and Distributed Systems
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Nonatomic mutual exclusion with local spinning
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Non-blocking timeout in scalable queue-based spin locks
Proceedings of the twenty-first annual symposium on Principles of distributed computing
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Affinity scheduling of unbalanced workloads
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Transactional lock-free execution of lock-based programs
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A space- and time-efficient local-spin spin lock
Information Processing Letters
International Journal of Parallel Programming
Two-Phase Barrier: A Synchronization Primitive for Improving the Processor Utilization
International Journal of Parallel Programming
Reducing Hot-Spot Contention in Shared-Memory Multiprocessor Systems
IEEE Concurrency
Characterizing the Performance of Algorithms for Lock-Free Objects
IEEE Transactions on Computers
A Unified Formalization of Four Shared-Memory Models
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Performance of Barrier Synchronization Methods in a Multiaccess Network
IEEE Transactions on Parallel and Distributed Systems
A Circular List-Based Mutual Exclusion Scheme for Large Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Reliable Hardware Barrier Synchronization Scheme
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Fast and Scalable Mutual Exclusion
Proceedings of the 13th International Symposium on Distributed Computing
Adaptive Mutual Exclusion with Local Spinning
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
A Practical Multi-word Compare-and-Swap Operation
DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Formal Verification of the MCS List-Based Queuing Lock
ASIAN '99 Proceedings of the 5th Asian Computing Science Conference on Advances in Computing Science
Symmetric Symbolic Safety-Analysis of Concurrent Software with Pointer Data Structures
FORTE '02 Proceedings of the 22nd IFIP WG 6.1 International Conference Houston on Formal Techniques for Networked and Distributed Systems
Performance Evaluation of the Omni OpenMP Compiler
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Automatic Verification of Pointer Data-Structure Systems for All Numbers of Processes
FM '99 Proceedings of the Wold Congress on Formal Methods in the Development of Computing Systems-Volume I - Volume I
Efficient synchronization for nonuniform communication architectures
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An improved lower bound for the time complexity of mutual exclusion
Distributed Computing - Special issue: Selected papers from PODC '01
Inferential queueing and speculative push for reducing critical communication latencies
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Using memory-mapped network interfaces to improve the performance of distributed shared memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Hierarchical Backoff Locks for Nonuniform Communication Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Real-time scalability of nested spin locks
RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Termination detection in data-driven parallel computations/applications
Journal of Parallel and Distributed Computing
Local-spin Mutual Exclusion Using Fetch-and-\phi Primitives
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Adaptive and efficient abortable mutual exclusion
Proceedings of the twenty-second annual symposium on Principles of distributed computing
Managing Concurrent Access for Shared Memory Active Messages
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
(De-) Clustering Objects for Multiprocessor System Software
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Software for multiprocessor networks on chip
Networks on chip
Shared-memory mutual exclusion: major research trends since 1986
Distributed Computing - Papers in celebration of the 20th anniversary of PODC
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
IEEE Transactions on Parallel and Distributed Systems
Thin locks: featherweight Synchronization for Java
ACM SIGPLAN Notices - Best of PLDI 1979-1999
IEEE Transactions on Software Engineering
A scalable lock-free stack algorithm
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Java server performance: a case study of building efficient, scalable Jvms
IBM Systems Journal
Performance Evaluation of Task Pools Based on Hardware Synchronization
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A new fast-path mechanism for mutual exclusion
Distributed Computing
A linear-time algorithm for optimal barrier placement
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Using local-spin k-exclusion algorithms to improve wait-free object implementations
Distributed Computing
Fast synchronization for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Fast synchronization on shared-memory multiprocessors: An architectural approach
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Inferential queueing and speculative push
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The java.util.concurrent synchronizer framework
Science of Computer Programming - Special issue: Concurrency and synchronization in Java programs
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip
Proceedings of the 3rd conference on Computing frontiers
Power/performance hardware optimization for synchronization intensive applications in MPSoCs
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Large scale Itanium® 2 processor OLTP workload characterization and optimization
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Hotswapping Linux kernel modules
Journal of Systems and Software
An Ω (n log n) lower bound on the cost of mutual exclusion
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
An efficient synchronization technique for multiprocessor systems on-chip
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Energy implications of multiprocessor synchronization
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Distributed computing using Java: a comparison of two server designs
Journal of Systems Architecture: the EUROMICRO Journal
Dense Gaussian networks: suitable topologies for on-chip multiprocessors
International Journal of Parallel Programming
A case study of multi-threading in the embedded space
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A tight bound on remote reference time complexity of mutual exclusion in the read-modify-write model
Journal of Parallel and Distributed Computing
On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Concurrent programming without locks
ACM Transactions on Computer Systems (TOCS)
A generic local-spin fetch-and-φ-based mutual exclusion algorithm
Journal of Parallel and Distributed Computing
Quantitative performance analysis of the SPEC OMPM2001 benchmarks
Scientific Programming - OpenMP
Self-tuning reactive diffracting trees
Journal of Parallel and Distributed Computing
Efficient self-tuning spin-locks using competitive analysis
Journal of Systems and Software
Evaluating synchronization techniques for light-weight multithreaded/multicore architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 34th annual international symposium on Computer architecture
Performance pathologies in hardware transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
Understanding Tradeoffs in Software Transactional Memory
Proceedings of the International Symposium on Code Generation and Optimization
Performance issues in parallelized network protocols
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Experiences with locking in a NUMA multiprocessor operating system kernel
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Fine grained kernel logging with KLogger: experience and insights
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Implementing MPI-IO Atomic Mode and Shared File Pointers Using MPI One-Sided Communication
International Journal of High Performance Computing Applications
Proceedings of the 21st annual international conference on Supercomputing
The mechanics of in-kernel synchronization for a scalable microkernel
ACM SIGOPS Operating Systems Review
Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Light-weight synchronization for inter-processor communication acceleration on embedded MPSoCs
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Tight RMR lower bounds for mutual exclusion and other problems
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Combinable memory-block transactions
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Dynamic recognition of synchronization operations for improved data race detection
ISSTA '08 Proceedings of the 2008 international symposium on Software testing and analysis
The Weakest Failure Detector for Message Passing Set-Agreement
DISC '08 Proceedings of the 22nd international symposium on Distributed Computing
Critical sections: re-emerging scalability concerns for database storage engines
Proceedings of the 4th international workshop on Data management on new hardware
Formal Analysis of the Bakery Protocol with Consideration of Nonatomic Reads and Writes
ICFEM '08 Proceedings of the 10th International Conference on Formal Methods and Software Engineering
A Comparison of the M-PCP, D-PCP, and FMLP on LITMUSRT
OPODIS '08 Proceedings of the 12th International Conference on Principles of Distributed Systems
Techniques for efficient placement of synchronization primitives
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Communications of the ACM - Security in the Browser
HW/SW methodologies for synchronization in FPGA multiprocessors
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Shore-MT: a scalable storage manager for the multicore era
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Extending concurrency of transactional memory programs by using value prediction
Proceedings of the 6th ACM conference on Computing frontiers
Enhancing operating system support for multicore processors by using hardware performance monitoring
ACM SIGOPS Operating Systems Review
Limited early value communication to improve performance of transactional memory
Proceedings of the 23rd international conference on Supercomputing
Adaptive mutual exclusion with local spinning
Distributed Computing
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
A new look at the roles of spinning and blocking
Proceedings of the Fifth International Workshop on Data Management on New Hardware
Verification of Parameterized Systems with Combinations of Abstract Domains
FMOODS '09/FORTE '09 Proceedings of the Joint 11th IFIP WG 6.1 International Conference FMOODS '09 and 29th IFIP WG 6.1 International Conference FORTE '09 on Formal Techniques for Distributed Systems
Brief announcement: distributed phase synchronization of dynamic set of processes
Proceedings of the 28th ACM symposium on Principles of distributed computing
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
A Methodology to Characterize Critical Section Bottlenecks in DSM Multiprocessors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Avoiding unbounded priority inversion in barrier protocols using gang priority management
Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A scalable lock-free stack algorithm
Journal of Parallel and Distributed Computing
Analyzing lock contention in multithreaded applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Efficient Reduction Techniques for Systems with Many Components
Electronic Notes in Theoretical Computer Science (ENTCS)
CoreDet: a compiler and runtime system for deterministic multithreaded execution
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Decoupling contention management from scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Iterative computations with ordered read-write locks
Journal of Parallel and Distributed Computing
Busy-wait barrier synchronization using distributed counters with local sensor
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
On-chip communication and synchronization mechanisms with cache-integrated network interfaces
Proceedings of the 7th ACM international conference on Computing frontiers
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Smartlocks: lock acquisition scheduling for self-aware synchronization
Proceedings of the 7th international conference on Autonomic computing
Lightweight, robust adaptivity for software transactional memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Flat combining and the synchronization-parallelism tradeoff
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
The k-bakery: local-spin k-exclusion using non-atomic reads and writes
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Group mutual exclusion in O(log n) RMR
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Constant RMR solutions to reader writer synchronization
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Your computer is already a distributed system. why isn't your OS?
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Optimizing collective communication on multicores
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
A study of shared-memory mutual exclusion protocols using CADP
FMICS'10 Proceedings of the 15th international conference on Formal methods for industrial critical systems
Collective operations in NEC's high-performance MPI libraries
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
HPP controller: a system controller for high performance computing
Frontiers of Computer Science in China
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Architectural Support for Fair Reader-Writer Locking
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SpiceC: scalable parallelism via implicit copying and explicit commit
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Distributed generalized dynamic barrier synchronization
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Parallel implementations of Brunotte's algorithm
Journal of Parallel and Distributed Computing
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A case for scaling applications to many-core with OS clustering
Proceedings of the sixth conference on Computer systems
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
A survey of hard real-time scheduling for multiprocessor systems
ACM Computing Surveys (CSUR)
Lightweight parallel accumulators using C++ templates
Proceedings of the 4th International Workshop on Multicore Software Engineering
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Brief announcement: a partitioned ticket lock
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A highly-efficient wait-free universal construction
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A complexity separation between the cache-coherent and distributed shared memory models
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Comparison of lock thrashing avoidance methods and its performance implications for lock design
Proceedings of the third international workshop on Large-scale system and application performance
Dynamic adaptive scheduling for virtual machines
Proceedings of the 20th international symposium on High performance distributed computing
Scalable distributed inference of dynamic user interests for behavioral targeting
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple domain user personalization
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Symmetry-aware predicate abstraction for shared-variable concurrent programs
CAV'11 Proceedings of the 23rd international conference on Computer aided verification
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Reducing biased lock revocation by learning
Proceedings of the 6th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems
PLDS: Partitioning linked data structures for parallelism
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Preemption adaptivity in time-published queue-based spin locks
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Implementing byte-range locks using MPI one-sided communication
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Reducing model checking of the few to the one
ICFEM'06 Proceedings of the 8th international conference on Formal Methods and Software Engineering
Scalable inference in latent variable models
Proceedings of the fifth ACM international conference on Web search and data mining
Read/Write based fast-path transformation for FCFS mutual exclusion
SOFSEM'05 Proceedings of the 31st international conference on Theory and Practice of Computer Science
Speeding-up synchronizations in DSM multiprocessors
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
TACAS'05 Proceedings of the 11th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Structure and algorithm for implementing OpenMP workshares
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Lock cohorting: a general technique for designing NUMA locks
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Transparent fault tolerance for grid applications
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Non-blocking hashtables with open addressing
DISC'05 Proceedings of the 19th international conference on Distributed Computing
Low-Overhead, high-speed multi-core barrier synchronization
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Tight time-space tradeoff for mutual exclusion
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
A tight RMR lower bound for randomized mutual exclusion
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Cost of mutual exclusion with spin locks on multi-core CPUs
BICA'12 Proceedings of the 5th WSEAS congress on Applied Computing conference, and Proceedings of the 1st international conference on Biologically Inspired Computation
Reagents: expressing and composing fine-grained concurrency
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Counter-Example guided fence insertion under TSO
TACAS'12 Proceedings of the 18th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Fighting back: using observability tools to improve the DBMS (not just diagnose it)
DBTest '12 Proceedings of the Fifth International Workshop on Testing Database Systems
Design, verification and applications of a new read-write lock algorithm
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Brief announcement: a tight RMR lower bound for randomized mutual exclusion
PODC '12 Proceedings of the 2012 ACM symposium on Principles of distributed computing
Proceedings of the 39th Annual International Symposium on Computer Architecture
Counterexample-guided abstraction refinement for symmetric concurrent programs
Formal Methods in System Design
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Journal of Experimental Algorithmics (JEA)
The Journal of Supercomputing
Fast asymmetric thread synchronization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
RMR-efficient randomized abortable mutual exclusion
DISC'12 Proceedings of the 26th international conference on Distributed Computing
Abortable reader-writer locks are no more complex than abortable mutex locks
DISC'12 Proceedings of the 26th international conference on Distributed Computing
Pessimistic software lock-elision
DISC'12 Proceedings of the 26th international conference on Distributed Computing
Using hardware transactional memory to correct and simplify and readers-writer lock algorithm
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable lock manager for multicores
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Portable mapping of openMP to multicore embedded systems using MCA APIs
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Science of Computer Programming
An O(1)-barriers optimal RMRs mutual exclusion algorithm: extended abstract
Proceedings of the 2013 ACM symposium on Principles of distributed computing
Proceedings of the Conference on Design, Automation and Test in Europe
Reducing contention through priority updates
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Efficient and truly passive MPI-3 RMA using InfiniBand atomics
Proceedings of the 20th European MPI Users' Group Meeting
Time analysable synchronisation techniques for parallelised hard real-time applications
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hybrid MPI: efficient message passing for multi-core systems
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable SIMD-parallel memory allocation for many-core machines
The Journal of Supercomputing
HARS: A hardware-assisted runtime software for embedded many-core architectures
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
The scalable commutativity rule: designing scalable software for multicore processors
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
DANBI: dynamic scheduling of irregular stream programs for many-core systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Lightweight contention management for efficient compare-and-swap operations
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Network-aware data caching and prefetching for cloud-hosted metadata retrieval
NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Leveraging hardware message passing for efficient thread synchronization
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach
The Journal of Supercomputing
On Automation in the Verification of Software Barriers: Experience Report
Journal of Automated Reasoning
MultiLanes: providing virtualized storage for OS-level virtualization on many cores
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Elimination Trees and the Construction of Pools and Stacks
Theory of Computing Systems
Hi-index | 0.02 |
Busy-wait techniques are heavily used for mutual exclusion andbarrier synchronization in shared-memory parallel programs.Unfortunately, typical implementations of busy-waiting tend to producelarge amounts of memory and interconnect contention, introducingperformance bottlenecks that become markedly more pronounced asapplications scale. We argue that this problem is not fundamental, andthat one can in fact construct busy-wait synchronization algorithms thatinduce no memory or interconnect contention. The key to these algorithmsis for every processor to spin on separatelocally-accessible flag variables,and for some other processor to terminate the spin with a single remotewrite operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue ofallocation in the local portion of physically distributed sharedmemory.We present a new scalable algorithm for spin locks that generates 0(1) remote references per lockacquisition, independent of the number of processors attempting toacquire the lock. Our algorithm provides reasonable latency in theabsence of contention, requires only a constant amount of space perlock, and requires no hardware support other than a swap-with-memoryinstruction. We also present a new scalable barrier algorithm thatgenerates 0(1) remote references perprocessor reaching the barrier, and observe that two previously-knownbarriers can likewise be cast in a form that spins only onlocally-accessible flag variables. None of these barrier algorithmsrequires hardware support beyond the usual atomicity of memory reads andwrites.We compare the performance of our scalable algorithms with othersoftware approaches to busy-wait synchronization on both a SequentSymmetry and a BBN Butterfly. Our principal conclusion is thatcontention due to synchronization need not be a problemin large-scale shared-memory multiprocessors. Theexistence of scalable algorithms greatly weakens the case for costlyspecial-purpose hardware support for synchronization, and provides acase against so-called “dance hall” architectures, in whichshared memory locations are equally far from all processors.—From the Authors' Abstract