Architecture-conscious hashing

Authors:
Marcin Zukowski;Sándor Héman;Peter Boncz
Affiliations:
CWI, Amsterdam, The Netherlands;CWI, Amsterdam, The Netherlands;CWI, Amsterdam, The Netherlands
Venue:
DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Year:
2006

Citing 16
Cited 13

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Perfect hashing

Theoretical Computer Science
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Conjunctive selection conditions in main memory

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimizing Main-Memory Join on Modern Hardware

IEEE Transactions on Knowledge and Data Engineering
Hash Joins and Hash Teams in Microsoft SQL Server

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
A Study of Index Structures for Main Memory Database Management Systems

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
An Overview of The System Software of A Parallel Relational Database Machine GRACE

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Query Processing in Tertiary Memory Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Improving Hash Join Performance through Prefetching

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Cuckoo hashing

Journal of Algorithms
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Cache-conscious radix-decluster projections

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Adaptive aggregation on chip multiprocessors

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
History-Independent Cuckoo Hashing

ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part II
CAM conscious integrated answering of frequent elements and top-k queries over data streams

Proceedings of the 4th international workshop on Data management on new hardware
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing

Proceedings of the 4th international workshop on Data management on new hardware
Indexing internal memory with minimal perfect hash functions

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Exploiting multithreaded architectures to improve the hash join operation

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
The DataPath system: a data-centric analytic processing engine for large data warehouses

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ChunkStash: speeding up inline storage deduplication using flash memory

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Fast and compact hash tables for integer keys

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Minimal perfect hashing: A competitive method for indexing internal memory

Information Sciences: an International Journal
An analytic data engine for visualization in tableau

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Cache index-aware memory allocation

Proceedings of the international symposium on Memory management
High throughput heavy hitter aggregation for modern SIMD processors

Proceedings of the Ninth International Workshop on Data Management on New Hardware

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hashing is one of the fundamental techniques used to implement query processing operators such as grouping, aggregation and join. This paper studies the interaction between modern computer architecture and hash-based query processing techniques. First, we focus on extracting maximum hashing performance from super-scalar CPUs. In particular, we discuss fast hash functions, ways to efficiently handle multi-column keys and propose the use of a recently introduced hashing scheme called Cuckoo Hashing over the commonly used bucket-chained hashing. In the second part of the paper, we focus on the CPU cache usage, by dynamically partitioning data streams such that the partial hash tables fit in the CPU cache. Conventional partitioning works as a separate preparatory phase, forcing materialization, which may require I/O if the stream does not fit in RAM. We introduce best-effort partitioning, a technique that interleaves partitioning with execution of hash-based query processing operators and avoids I/O. In the process, we show how to prevent issues in partitioning with cacheline alignment, that can strongly decrease throughput. We also demonstrate overall query processing performance when both CPU-efficient hashing and best-effort partitioning are combined.