A randomized parallel sorting algorithm with an experimental study
Journal of Parallel and Distributed Computing
Main-memory index structures with fixed-size partial keys
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast parallel in-memory 64-bit sorting
ICS '01 Proceedings of the 15th international conference on Supercomputing
Improved long-period generators based on linear recurrences modulo 2
ACM Transactions on Mathematical Software (TOMS)
Memory Systems: Cache, DRAM, Disk
Memory Systems: Cache, DRAM, Disk
An Efficient Parallel Algorithm for Graph-Based Image Segmentation
CAIP '09 Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of counting sort. Taking advantage of virtual memory and making use of write-combining yields a per-pass throughput corresponding to at least 89% of the system's peak memory bandwidth. Our implementation outperforms Intel's recently published radix sort by a factor of 1.64. It also compares favorably to the reported performance of an algorithm for Fermi GPUs when data-transfer overhead is included. These results indicate that scalar, bandwidth-sensitive sorting algorithms remain competitive on current architectures. Various other memory-intensive applications can benefit from the techniques described herein.