Efficient frequent item counting in multi-core hardware

Authors:
Pratanu Roy;Jens Teubner;Gustavo Alonso
Affiliations:
ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 20
Cited 2

Systolic (VLSI) arrays for relational database operations

SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data
Optimizing Main-Memory Join on Modern Hardware

IEEE Transactions on Knowledge and Data Engineering
Control Versus Data Flow in Parallel Database Machines

IEEE Transactions on Parallel and Distributed Systems
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
An integrated efficient solution for computing frequent and top-k elements in data streams

ACM Transactions on Database Systems (TODS)
FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Finding frequent items in data streams

Proceedings of the VLDB Endowment
Parallel Data Mining on Multicore Clusters

GCC '08 Proceedings of the 2008 Seventh International Conference on Grid and Cooperative Computing
Frequent items in streaming data: An experimental evaluation of the state-of-the-art

Data & Knowledge Engineering
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
Thread cooperation in multicore architectures for frequency counting over multiple data streams

Proceedings of the VLDB Endowment
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Proceedings of the VLDB Endowment
Design and evaluation of main memory hash join algorithms for multi-core CPUs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
How soccer players would do stream joins

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Scalable aggregation on multicore processors

Proceedings of the Seventh International Workshop on Data Management on New Hardware
Finding frequent items in parallel

Concurrency and Computation: Practice & Experience

High throughput heavy hitter aggregation for modern SIMD processors

Proceedings of the Ninth International Workshop on Data Management on New Hardware
Accelerating frequent item counting with FPGA

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing number of cores and the rich instruction sets of modern hardware are opening up new opportunities for optimizing many traditional data mining tasks. In this paper we demonstrate how to speed up the performance of the computation of frequent items by almost one order of magnitude over the best published results by matching the algorithm to the underlying hardware architecture. We start with the observation that frequent item counting, like other data mining tasks, assumes certain amount of skew in the data. We exploit this skew to design a new algorithm that uses a pre-filtering stage that can be implemented in a highly efficient manner through SIMD instructions. Using pipelining, we then combine this pre-filtering stage with a conventional frequent item algorithm (Space-Saving) that will process the remainder of the data. The resulting operator can be parallelized with a small number of cores, leading to a parallel implementation that does not suffer any of the overheads of existing parallel solutions when querying the results and offers significantly higher throughput.