Non-blocking programming on multi-core graphics processors: (extended asbtract)

Authors:
Phuong Hoai Ha;Philippas Tsigas;Otto J. Anshus
Affiliations:
University of Tromsø, Tromsø, Norway;Chalmers University of Technology, Göteborg, Sweden;University of Tromsø, Tromsø, Norway
Venue:
ACM SIGARCH Computer Architecture News
Year:
2009

Citing 19
Cited 0

On the minimal synchronism needed for distributed consensus

Journal of the ACM (JACM)
Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Randomized wait-free concurrent objects (extended abstract)

PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Bounded round number

PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Generalized FLP impossibility result for t-resilient asynchronous computations

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Determining consensus numbers

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Concurrent Reading While Writing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Concurrent reading and writing

Communications of the ACM
Evaluating the performance of non-blocking synchronization on shared-memory multiprocessors

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies

WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Shared Memory Consistency Models: A Tutorial

Computer
Relative Performance of Preemption-Safe Locking and Non-Blocking Synchronization on Multiprogrammed Shared Memory Multiprocessors

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Managing Concurrent Access for Shared Memory Active Messages

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Distributed Computing: Fundamentals, Simulations and Advanced Topics

Distributed Computing: Fundamentals, Simulations and Advanced Topics
Generalized Irreducibility of Consensus and the Equivalence of t-Resilient and Wait-Free Implementations of Consensus

SIAM Journal on Computing
On the importance of having an identity or, is consensus really universal?

Distributed Computing - Special issue: DISC 04
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Wait-free programming for general purpose computations on graphics processors

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore processors, without the need of synchronization primitives other than reads and writes. Moreover, based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), where N is the number of processes. Our result demonstrates that it is possible to construct waitfree synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.