High throughput heavy hitter aggregation for modern SIMD processors

Authors:
Orestis Polychroniou;Kenneth A. Ross
Affiliations:
Columbia University;Columbia University
Venue:
Proceedings of the Ninth International Workshop on Data Management on New Hardware
Year:
2013

Citing 18
Cited 0

A reliable randomized algorithm for the closest-pair problem

Journal of Algorithms
Implementing database operations using SIMD instructions

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Optimizing database architecture for the new bottleneck: memory access

The VLDB Journal — The International Journal on Very Large Data Bases
Finding Repeated Elements

Finding Repeated Elements
Cuckoo hashing

Journal of Algorithms
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Architecture-conscious hashing

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
An integrated efficient solution for computing frequent and top-k elements in data streams

ACM Transactions on Database Systems (TODS)
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Adaptive aggregation on chip multiprocessors

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Finding frequent items in data streams

Proceedings of the VLDB Endowment
Weaknesses of Cuckoo Hashing with a Simple Universal Hash Class: The Case of Large Universes

SOFSEM '09 Proceedings of the 35th Conference on Current Trends in Theory and Practice of Computer Science
Automatic contention detection and amelioration for data-intensive operations

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Scalable aggregation on multicore processors

Proceedings of the Seventh International Workshop on Data Management on New Hardware
Efficiently compiling efficient query plans for modern hardware

Proceedings of the VLDB Endowment
Efficient frequent item counting in multi-core hardware

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heavy hitters are data items that occur at high frequency in a data set. They are among the most important items for an organization to summarize and understand during analytical processing. In data sets with sufficient skew, the number of heavy hitters can be relatively small. We take advantage of this small footprint to compute aggregate functions for the heavy hitters in fast cache memory in a single pass. We design cache-resident, shared-nothing structures that hold only the most frequent elements. Our algorithm works in three phases. It first samples and picks heavy hitter candidates. It then builds a hash table and computes the exact aggregates of these elements. Finally, a validation step identifies the true heavy hitters from among the candidates. We identify trade-offs between the hash table configuration and performance. Configurations consist of the probing algorithm and the table capacity that determines how many candidates can be aggregated. The probing algorithm can be perfect hashing, cuckoo hashing and bucketized hashing to explore trade-offs between size and speed. We optimize performance by the use of SIMD instructions, utilized in novel ways beyond single vectorized operations, to minimize cache accesses and the instruction footprint.