TritonSort: a balanced large-scale sorting system

Authors:
Alexander Rasmussen;George Porter;Michael Conley;Harsha V. Madhyastha;Radhika Niranjan Mysore;Alexander Pucher;Amin Vahdat
Affiliations:
UC San Diego;UC San Diego;UC San Diego;UC Riverside;UC San Diego;Vienna University of Technology;UC San Diego
Venue:
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Year:
2011

Citing 13
Cited 19

A measure of transaction processing power

Datamation
The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
The input/output complexity of sorting and related problems

Communications of the ACM
High-performance sorting on networks of workstations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
AlphaSort: a cache-sensitive parallel external sort

The VLDB Journal — The International Journal on Very Large Data Bases
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Volcano— An Extensible and Parallel Query Evaluation System

IEEE Transactions on Knowledge and Data Engineering
The Architectural Costs of Streaming I/O: A Comparison of Workstations, Clusters, and SMPs

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Data center TCP (DCTCP)

Proceedings of the ACM SIGCOMM 2010 conference

Efficiently measuring bandwidth at all time scales

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Improving per-node efficiency in the datacenter with new OS abstractions

Proceedings of the 2nd ACM Symposium on Cloud Computing
Switching the optical divide: fundamental challenges for hybrid electrical/optical datacenter networks

Proceedings of the 2nd ACM Symposium on Cloud Computing
Practical TDMA for datacenter ethernet

Proceedings of the 7th ACM european conference on Computer Systems
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Re-optimizing data-parallel computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Structured comparative analysis of systems logs to diagnose performance problems

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
A demonstration of ultra-low-latency data center optical circuit switching

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
EyeQ: practical network performance isolation for the multi-tenant cloud

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
A demonstration of ultra-low-latency data center optical circuit switching

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Flat datacenter storage

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Sailfish: a framework for large scale data processing

Proceedings of the Third ACM Symposium on Cloud Computing
Themis: an I/O-efficient MapReduce

Proceedings of the Third ACM Symposium on Cloud Computing
TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System

ACM Transactions on Computer Systems (TOCS)
HykSort: a new variant of hypercube quicksort on distributed memory architectures

Proceedings of the 27th international ACM conference on International conference on supercomputing
QuickSAN: a storage area network for fast, distributed, solid state disks

Proceedings of the 40th Annual International Symposium on Computer Architecture
Algorithms for high-throughput disk-to-disk sorting

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Journal of Grid Computing
Bullet trains: a study of NIC burst behavior at microsecond timescales

Proceedings of the ninth ACM conference on Emerging networking experiments and technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in 52 nodes at a rate of 0.916 TB/min. When evaluated against the annual Indy GraySort sorting benchmark, TritonSort is 60% better in absolute performance and has over six times the per-node efficiency of the previous record holder. In this paper, we describe the hardware and software architecture necessary to operate TritonSort at this level of efficiency. Through careful management of system resources to ensure cross-resource balance, we are able to sort data at approximately 80% of the disks' aggregate sequential write speed. We believe the work holds a number of lessons for balanced system design and for scale-out architectures in general. While many interesting systems are able to scale linearly with additional servers, per-server performance can lag behind per-server capacity by more than an order of magnitude. Bridging the gap between high scalability and high performance would enable either significantly cheaper systems that are able to do the same work or provide the ability to address significantly larger problem sets with the same infrastructure.