Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
GPUTeraSort: high performance graphics co-processor sorting for large database management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Fast parallel GPU-sorting using a hybrid algorithm
Journal of Parallel and Distributed Computing
A Practical Quicksort Algorithm for Graphics Processors
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Efficient implementation of sorting on multi-core SIMD CPU architecture
Proceedings of the VLDB Endowment
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Revisiting sorting for GPGPU stream architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
GPU-ABiSort: optimal parallel sorting on stream architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Fast box-counting algorithm on GPU
Computer Methods and Programs in Biomedicine
Counting and occurrence sort for GPUs using an embedded language
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Hi-index | 0.00 |
We describe experience on design and implementation of an efficient count sort algorithm on Compute Unified Device Architecture graphics processing units. The novelty of this work is twofold. At first, we propose a count sort algorithm for integers that needs no synchronization at its last step and thus, offers superior performance. At second, this work contributes ad hoc techniques for optimizing the performance of the algorithm on Compute Unified Device Architecture-enabled graphics processing units. Copyright © 2011 John Wiley & Sons, Ltd.