A taxonomy of parallel sorting
ACM Computing Surveys (CSUR)
Analysis of the performance of the parallel Quicksort method
BIT - Ellis Horwood series in artificial intelligence
A logarithmic time sort for linear size networks
Journal of the ACM (JACM)
Communications of the ACM - Special issue on parallelism
Contention is no obstacle to shared-memory multiprocessing
Communications of the ACM - Special issue on parallelism
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing Quicksort programs
Communications of the ACM
Parallel Sorting Algorithms
On parallel searching (Extended Abstract)
PODC '82 Proceedings of the first ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
A Practical Quicksort Algorithm for Graphics Processors
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
WEA'08 Proceedings of the 7th international conference on Experimental algorithms
A highly-efficient wait-free universal construction
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A cost optimal parallel quicksorting and its implementation on a shared memory parallel computer
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
The Journal of Supercomputing
Hi-index | 14.98 |
A parallelization of the Quicksort algorithm that is suitable for execution on a shared memory multiprocessor with an efficient implementation of the fetch-and-add operation is presented. The partitioning phase of Quicksort, which has been considered a serial bottleneck, is cooperatively executed in parallel by many processors through the use of fetch-and-add. The parallel algorithm maintains the in-place nature of Quicksort, thereby allowing internal sorting of large arrays. A class of fetch-and-add-based algorithms for dynamically scheduling processors to subproblems is presented. Adaptive scheduling algorithms in this class have low overhead and achieve effective processor load balancing. The basic algorithm is shown to execute in an average of O(log(N)) time on an N-processor PRAM (parallel random-access machine) assuming a constant-time fetch-and-add. Estimated speedups, based on simulations, are also presented for cases when the number of items to be sorted is much greater than the number of processors.