A taxonomy of parallel sorting
ACM Computing Surveys (CSUR)
Analysis of the performance of the parallel Quicksort method
BIT - Ellis Horwood series in artificial intelligence
The parallel neighbour sort and 2-way merge algorithm (Short Communication)
Parallel Computing
A dynamic-trace-driven simulator for evaluating parallelism
Proceedings of the Twenty-First Annual Hawaii International Conference on Architecture Track
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Bounds to Complexities of Networks for Sorting and for Switching
Journal of the ACM (JACM)
Implementing Quicksort programs
Communications of the ACM
Merging with parallel processors
Communications of the ACM
The Science of Programming
Percentile finding algorithm for multiple sorted runs
VLDB '89 Proceedings of the 15th international conference on Very large data bases
More time-work tradeoffs for parallel graph algorithms
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
ACM SIGARCH Computer Architecture News
Communication conscious radix sort
ICS '99 Proceedings of the 13th international conference on Supercomputing
Efficient implementation of sorting on multi-core SIMD CPU architecture
Proceedings of the VLDB Endowment
Fast updates on read-optimized databases using multi-core CPUs
Proceedings of the VLDB Endowment
A high-performance sorting algorithm for multicore single-instruction multiple-data processors
Software—Practice & Experience
Hi-index | 14.98 |
The first parallel sort algorithm for shared memory MIMD (multiple-instruction-multiple-data-stream) multiprocessors that has a theoretical and measured speedup near linear is exhibited. It is based on a novel asynchronous parallel merge that evenly partitions data to be merged among any number of processors. A benchmark sorting algorithm is proposed that uses this merge to remove the linear time bottleneck inherent in previous multiprocessors sorts. This sort, when applied to data set on p processors, has a time complexity of O((n log n)/p)+O((n log p)/p) and a space complexity of 2n, where n is the number of keys being sorted. Evaluations of the merge and benchmark sort algorithms on a 12-processor Sequent Balance 21000 System demonstrate near-linear speedup when compared to sequential Quicksort.