Partitioned parallel radix sort

Authors:
Shin-Jae Lee;Minsoo Jeon;Dongseung Kim;Andrew Sohn
Affiliations:
Department of Electrical Engineering, Korea University, Seoul 136-701, Korea;Department of Electrical Engineering, Korea University, Seoul 136-701, Korea;Department of Electrical Engineering, Korea University, Seoul 136-701, Korea;Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey 07102-1982
Venue:
Journal of Parallel and Distributed Computing
Year:
2002

Citing 12
Cited 6

Algorithms

Algorithms
Tight bounds on the complexity of parallel sorting

IEEE Transactions on Computers
Sorting n Objects with a k-Sorter

IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
An introduction to parallel algorithms

An introduction to parallel algorithms
Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Fast Parallel Sorting Under LogP: Experience with the CM-5

IEEE Transactions on Parallel and Distributed Systems
Load balanced parallel radix sort

ICS '98 Proceedings of the 12th international conference on Supercomputing
Minimizing Communication in the Bitonic Sort

IEEE Transactions on Parallel and Distributed Systems
Sorting

ACM Computing Surveys (CSUR)
Communication-Efficient Bitonic Sort on a Distributed Memory Parallel Computer

ICPADS '01 Proceedings of the Eighth International Conference on Parallel and Distributed Systems
Identifying the Capability of Overlapping Computation with Communication

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Load-Balanced Parallel Merge Sort on Distributed Memory Parallel Computers

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Parallelizing Merge Sort onto Distributed Memory Parallel Computers

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Feedback-directed thread scheduling with memory considerations

Proceedings of the 16th international symposium on High performance distributed computing
High-speed parallel external sorting of data with arbitrary distribution

International Journal of High Performance Computing and Networking
Parallel external sort of floating-point data by integer conversion

ACC'08 Proceedings of the WSEAS International Conference on Applied Computing Conference
Library support for parallel sorting in scientific computations

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to right-to-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributed-memory platforms, including IBM SP2, Cray T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGI T3E does 20% to 100% in execution time. PC cluster shows over 2.4-fold improvement in execution time.