Fast in-place sorting with CUDA based on bitonic sort

Authors:
Hagen Peters;Ole Schulz-Hildebrandt;Norbert Luttenberger
Affiliations:
Research Group for Communication Systems, Department of Computer Science, Christian-Albrechts-University Kiel, Germany;Research Group for Communication Systems, Department of Computer Science, Christian-Albrechts-University Kiel, Germany;Research Group for Communication Systems, Department of Computer Science, Christian-Albrechts-University Kiel, Germany
Venue:
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Year:
2009

Citing 10
Cited 4

Efficient conditional operations for data-parallel architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Photon mapping on programmable graphics hardware

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
UberFlow: a GPU-based particle engine

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Efficient gather and scatter operations on graphics processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Fast parallel GPU-sorting using a hybrid algorithm

Journal of Parallel and Distributed Computing
A Practical Quicksort Algorithm for Graphics Processors

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Sorting networks and their applications

AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Designing efficient sorting algorithms for manycore GPUs

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
GPU-ABiSort: optimal parallel sorting on stream architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Parallel Shellsort Algorithm for Many-Core GPUs with CUDA

International Journal of Grid and High Performance Computing
StreamScan: fast scan algorithms for GPUs without global barrier synchronization

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Accelerating simulation of agent-based models on heterogeneous architectures

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Bitonic sort on a chained-cubic tree interconnection network

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance coprocessors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithms that fit to the characteristics of modern GPU-architecture. We present a high-performance in-place implementation of Batcher's bitonic sorting networks for CUDA-enabled GPUs. We adapted bitonic sort for arbitrary input length and assigned compare/exchange-operations to threads in a way that decreases low-performance global-memory access and thereby greatly increases the performance of the implementation.