AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors

Authors:
Hiroshi Inoue;Takao Moriyama;Hideaki Komatsu;Toshio Nakatani
Affiliations:
IBM Tokyo Research Laboratory, Japan;IBM Tokyo Research Laboratory, Japan;IBM Tokyo Research Laboratory, Japan;IBM Tokyo Research Laboratory, Japan
Venue:
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Year:
2007

Citing 0
Cited 14

Efficient implementation of sorting on multi-core SIMD CPU architecture

Proceedings of the VLDB Endowment
Optimized Pipelined Parallel Merge Sort on the Cell BE

Euro-Par 2008 Workshops - Parallel Processing
Optimized on-chip pipelining of memory-intensive computations on the cell BE

ACM SIGARCH Computer Architecture News
Data processing on FPGAs

Proceedings of the VLDB Endowment
State-of-the-art in heterogeneous computing

Scientific Programming
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Optimized on-chip-pipelined mergesort on the cell/B.E.

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
OpenCL and parallel primitives for digital TV applications

IBM Journal of Research and Development
Mind the gap!: bridging the dichotomy of design and implementation

Proceedings of the 4th International Workshop on Software Engineering for Computational Science and Engineering
Sorting networks on FPGAs

The VLDB Journal — The International Journal on Very Large Data Bases
Simplification of FEM-models on cell BE

MMCS'08 Proceedings of the 7th international conference on Mathematical Methods for Curves and Surfaces
A high-performance sorting algorithm for multicore single-instruction multiple-data processors

Software—Practice & Experience
Register level sort algorithm on multi-core SIMD processors

IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Efficient sorting design on a novel embedded parallel computing architecture with unique memory access

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both SIMD instructions and thread-level parallelism. In this paper, we propose a new parallel sorting algorithm, called Aligned-Access sort (AA-sort), for shared-memory multi processors. The AA-sort algorithm takes advantage of SIMD instructions. The key to high performance is eliminating unaligned memory accesses that would reduce the effectiveness of SIMD instructions. We implemented and evaluated the AA-sort on PowerPC® 970MP and Cell Broadband EngineTM. In summary, a sequential version of the AA-sort using SIMD instructions outperformed IBM's optimized sequential sorting library by 1.8 times and GPUTeraSort using SIMD instructions by 3.3 times on PowerPC 970MP when sorting 32 M of random 32-bit integers. Furthermore, a parallel version of AA-sort demonstrated better scalability with increasing numbers of cores than a parallel version of GPUTeraSort on both platforms.