A Benchmark Parallel Sort for Shared Memory Multiprocessors
IEEE Transactions on Computers
Introspective sorting and selection algorithms
Software—Practice & Experience
Implementing database operations using SIMD instructions
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Photon mapping on programmable graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Implementing sorting in database systems
ACM Computing Surveys (CSUR)
GPUTeraSort: high performance graphics co-processor sorting for large database management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Fast parallel GPU-sorting using a hybrid algorithm
Journal of Parallel and Distributed Computing
A Practical Quicksort Algorithm for Graphics Processors
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Optimized Pipelined Parallel Merge Sort on the Cell BE
Euro-Par 2008 Workshops - Parallel Processing
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Computers and Electrical Engineering
Hi-index | 0.00 |
Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both single-instruction multiple-data (SIMD) instructions and thread-level parallelism. In this paper, we propose a new high-performance sorting algorithm, called aligned-access sort (AA-sort), that exploits both the SIMD instructions and thread-level parallelism available on today's multicore processors. Our algorithm consists of two phases, an in-core sorting phase and an out-of-core merging phase. The in-core sorting phase uses our new sorting algorithm that extends combsort to exploit SIMD instructions. The out-of-core algorithm is based on mergesort with our novel vectorized merging algorithm. Both phases can take advantage of SIMD instructions. The key to high performance is eliminating unaligned memory accesses that would reduce the effectiveness of SIMD instructions in both phases. We implemented and evaluated the AA-sort on PowerPC 970MP and Cell Broadband Engine platforms. In summary, a sequential version of the AA-sort using SIMD instructions outperformed IBM's optimized sequential sorting library by 1.8 times and bitonic mergesort using SIMD instructions by 3.3 times on PowerPC 970MP when sorting 32 million random 32-bit integers. Also, a parallel version of AA-sort demonstrated better scalability with increasing numbers of cores than a parallel version of bitonic mergesort on both platforms. Copyright © 2011 John Wiley & Sons, Ltd.