Sorting in c log n parallel steps
Combinatorica
Tight bounds on the complexity of parallel sorting
IEEE Transactions on Computers
SIAM Journal on Computing
Towards an architecture-independent analysis of parallel algorithms
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Adaptive bitonic sorting: an optimal parallel algorithm for shared-memory machines
SIAM Journal on Computing
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A complexity theory of efficient parallel algorithms
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation
Communications of the ACM
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Parallel algorithms for shared-memory machines
Handbook of theoretical computer science (vol. A)
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
An introduction to parallel algorithms
An introduction to parallel algorithms
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallel computing (2nd ed.): theory and practice
Parallel computing (2nd ed.): theory and practice
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Parallel sorting with limited bandwidth
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Practical parallel algorithms for personalized communication and integer sorting
Journal of Experimental Algorithmics (JEA)
Synthesis of Parallel Algorithms
Synthesis of Parallel Algorithms
Experience with active messages on the Meiko CS-2
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Notes on merging networks (Prelimiary Version)
STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
A Taxonomy of Parallel Sorting
A Taxonomy of Parallel Sorting
Optimizing Parallel Bitonic Sort
Optimizing Parallel Bitonic Sort
Efficient Oblivious Parallel Sorting on the MasPar MP-1
HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Software Technology and Architecture - Volume 1
Parallel Processing with the Perfect Shuffle
IEEE Transactions on Computers
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Implementing Sorting Networks with Spiking Neural P Systems
Fundamenta Informaticae
Communication-space efficient parallel Bitonic sorting on Symmetric Multiprocessors
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Implementing Sorting Networks with Spiking Neural P Systems
Fundamenta Informaticae
Computers and Electrical Engineering
Hi-index | 0.00 |
Sorting is an important component of many applications, and parallel sorting algorithms have been studied extensively in the last three decades. One of the earliest parallel sorting algorithms is Bitonic Sort, which is represented by a sorting network consisting of multiple butterfly stages. This paper studies bitonic sort on modern parallel machines which are relatively coarse grained and consist of only a modest number of nodes, thus requiring the mapping of many data elements to each processor. Under such a setting optimizing the bitonic sort algorithm becomes a question of mapping the data elements to processing nodes (data layout) such that communication is minimized. We developed a bitonic sort algorithm which minimizes the number of communication steps and optimizes the local computation. The resulting algorithm is faster than previous implementations, as experimental results collected on a 64 node Meiko CS-2 show.