Register level sort algorithm on multi-core SIMD processors

  • Authors:
  • Tian Xiaochen;Kamil Rocki;Reiji Suda

  • Affiliations:
  • The University of Tokyo & CREST, JST;The University of Tokyo & CREST, JST;The University of Tokyo & CREST, JST

  • Venue:
  • IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

State-of-the-art hardware increasingly utilizes SIMD parallelism, where multiple processing elements execute the same instruction on multiple data points simultaneously. However, irregular and data intensive algorithms are not well suited for such architectures. Due to their importance, it is crucial to obtain efficient implementations. One example of such a task is sort, a fundamental problem in computer science. In this paper we analyze distinct memory accessing models and propose two methods to employ highly efficient bitonic merge sort using SIMD instructions as register level sort. We achieve nearly 270x speedup (525M integers/s) on a 4M integer set using Xeon Phi coprocessor, where SIMD level parallelism accelerates the algorithm over 3 times. Our method can be applied to any device supporting similar SIMD instructions.