ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
A rapid hierarchical radiosity algorithm
Proceedings of the 18th annual conference on Computer graphics and interactive techniques
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
A methodology and an evaluation of the SGI Origin2000
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors
Journal of Parallel and Distributed Computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Evaluating the performance of non-blocking synchronization on shared-memory multiprocessors
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Scalable and lock-free concurrent dictionaries
Proceedings of the 2004 ACM symposium on Applied computing
Fast and lock-free concurrent priority queues for multi-thread systems
Journal of Parallel and Distributed Computing
Multiword atomic read/write registers on multiprocessor systems
Journal of Experimental Algorithmics (JEA)
On the design and implementation of a shared memory dispatcher for partially clairvoyant schedulers
International Journal of Parallel Programming
Non-blocking programming on multi-core graphics processors: (extended asbtract)
ACM SIGARCH Computer Architecture News
NOBLE: non-blocking programming support via lock-free shared abstract data types
ACM SIGARCH Computer Architecture News
Supporting lock-free composition of concurrent data objects
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Supporting lock-free composition of concurrent data objects
Proceedings of the 7th ACM international conference on Computing frontiers
Dynamic lock synchronisation for collaborative 3D applications
Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia
Progress guarantees when composing lock-free objects
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Allocating memory in a lock-free manner
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Effective use of non-blocking data structures in a deduplication application
Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity
Hi-index | 0.00 |
In this paper we investigate how performance and speedup of applications would be affected by using non-blocking rather than blocking synchronisation in parallel systems. The results obtained show that for many applications, non-blocking synchronisation lead to significant speedups for a fairly large number of processors, while it never slows the applications down. As part of this investigation this paper also provides a set of efficient and simple translations that show how typical blocking operations found in parallel applications, such as simple locks, queues and lock trees can be translated into non-blocking equivalents that use hardware primitives common in modern multiprocessor systems. With these translations this paper clearly demonstrates that it is easy for the application designer/programmer to replace the blocking operations commonly found on with non-blocking equivalents ones. For the empirical results a set of representative applications running on a large-scale ccNUMA machine were used.