HykSort: a new variant of hypercube quicksort on distributed memory architectures

  • Authors:
  • Hari Sundar;Dhairya Malhotra;George Biros

  • Affiliations:
  • University of Texas at Austin, Austin, TX, USA;University of Texas at Austin, Austin, TX, USA;University of Texas at Austin, Austin, TX, USA

  • Venue:
  • Proceedings of the 27th international ACM conference on International conference on supercomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present HykSort, an optimized comparison sort for distributed memory architectures that attains more than 2× improvement over bitonic sort and samplesort. The algorithm is based on the hypercube quicksort, but instead of a binary recursion, we perform a k-way recursion in which the pivots are selected accurately with an iterative parallel select algorithm. The single-node sort is performed using a vectorized and multithreaded merge sort. The advantages of HykSort are lower communication costs, better load balancing, and avoidance of O(p)-collective communication primitives. We also present a staged communication samplesort, which is more robust than the original samplesort for large core counts. We conduct an experimental study in which we compare hypercube sort, bitonic sort, the original samplesort, the staged samplesort, and HykSort. We report weak and strong scaling results and study the effect of the grain size. It turns out that no single algorithm performs best and a hybridization strategy is necessary. As a highlight of our study, on our largest experiment on 262,144 AMD cores of the CRAY XK7 "Titan" platform at the Oak Ridge National Laboratory we sorted 8 trillion 32-bit integer keys in 37 seconds achieving 0.9TB/s effective throughput.