Algorithms for high-throughput disk-to-disk sorting

Authors:
Hari Sundar;Dhairya Malhotra;Karl W. Schulz
Affiliations:
The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;Texas Advanced Computing Center, Austin, TX
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 13
Cited 0

A comparison of sorting algorithms for the connection machine CM-2

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A Comparison Based Parallel Sorting Algorithm

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 03
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bottom-Up Construction and 2:1 Balance Refinement of Linear Octrees in Parallel

SIAM Journal on Scientific Computing
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
From Microprocessors to Nanostores: Rethinking Data-Centric Systems

Computer
The case for RAMCloud

Communications of the ACM
TritonSort: a balanced large-scale sorting system

Proceedings of the 8th USENIX conference on Networked systems design and implementation
A massively parallel adaptive fast multipole method on heterogeneous architectures

Communications of the ACM
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
ExaScale high performance computing in the square kilometer array

Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date
HykSort: a new variant of hypercube quicksort on distributed memory architectures

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a new out-of-core sort algorithm, designed for problems that are too large to fit into the aggregate RAM available on modern supercomputers. We analyze the performance including the cost of IO and demonstrate the fastest (to the best of our knowledge) reported throughput using the canonical sortBenchmark on a general-purpose, production HPC resource running Lustre. By clever use of available storage and a formulation of asynchronous data transfer mechanisms, we are able to almost completely hide the computation (sorting) behind the IO latency. This latency hiding enables us to achieve comparable execution times, including the additional temporary IO required, between a large sort problem (5TB) run as a single, in-RAM sort and our out-of-core approach using 1/10th the amount of RAM. In our largest run, sorting 100TB of records using 1792 hosts, we achieved an end-to-end throughput of 1.24TB/min using our general-purpose sorter, improving on the current Daytona record holder by 65%.