Memory-efficient groupby-aggregate using compressed buffer trees

  • Authors:
  • Hrishikesh Amur;Wolfgang Richter;David G. Andersen;Michael Kaminsky;Karsten Schwan;Athula Balachandran;Erik Zawadzki

  • Affiliations:
  • Georgia Institute of Technology;Carnegie Mellon University;Carnegie Mellon University;Intel Labs;Georgia Institute of Technology;Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • Proceedings of the 4th annual Symposium on Cloud Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed data structure called a Compressed Buffer Tree (CBT). Using a combination of techniques including buffering, compression, and serialization, CBTs improve the memory efficiency and performance of the GroupBy-Aggregate abstraction that forms the basis of not only batch-processing models like MapReduce, but recent fast analytics systems too. For streaming workloads, aggregation using the CBT uses 21--42% less memory than using Google SparseHash with up to 16% better throughput. The CBT is also compared to batch-mode aggregators in MapReduce runtimes such as Phoenix++ and Metis and consumes 4x and 5x less memory with 1.5--2x and 3--4x more performance respectively.