Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Theoretical Computer Science
SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Conjunctive selection conditions in main memory
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimizing Main-Memory Join on Modern Hardware
IEEE Transactions on Knowledge and Data Engineering
Hash Joins and Hash Teams in Microsoft SQL Server
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Hashing Methods and Relational Algebra Operations
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
A Study of Index Structures for Main Memory Database Management Systems
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
An Overview of The System Software of A Parallel Relational Database Machine GRACE
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Query Processing in Tertiary Memory Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Journal of Algorithms
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Cache-conscious radix-decluster projections
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Adaptive aggregation on chip multiprocessors
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
History-Independent Cuckoo Hashing
ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part II
CAM conscious integrated answering of frequent elements and top-k queries over data streams
Proceedings of the 4th international workshop on Data management on new hardware
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing
Proceedings of the 4th international workshop on Data management on new hardware
Indexing internal memory with minimal perfect hash functions
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Exploiting multithreaded architectures to improve the hash join operation
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
The DataPath system: a data-centric analytic processing engine for large data warehouses
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ChunkStash: speeding up inline storage deduplication using flash memory
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Fast and compact hash tables for integer keys
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Minimal perfect hashing: A competitive method for indexing internal memory
Information Sciences: an International Journal
An analytic data engine for visualization in tableau
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Cache index-aware memory allocation
Proceedings of the international symposium on Memory management
High throughput heavy hitter aggregation for modern SIMD processors
Proceedings of the Ninth International Workshop on Data Management on New Hardware
Hi-index | 0.00 |
Hashing is one of the fundamental techniques used to implement query processing operators such as grouping, aggregation and join. This paper studies the interaction between modern computer architecture and hash-based query processing techniques. First, we focus on extracting maximum hashing performance from super-scalar CPUs. In particular, we discuss fast hash functions, ways to efficiently handle multi-column keys and propose the use of a recently introduced hashing scheme called Cuckoo Hashing over the commonly used bucket-chained hashing. In the second part of the paper, we focus on the CPU cache usage, by dynamically partitioning data streams such that the partial hash tables fit in the CPU cache. Conventional partitioning works as a separate preparatory phase, forcing materialization, which may require I/O if the stream does not fit in RAM. We introduce best-effort partitioning, a technique that interleaves partitioning with execution of hash-based query processing operators and avoids I/O. In the process, we show how to prevent issues in partitioning with cacheline alignment, that can strongly decrease throughput. We also demonstrate overall query processing performance when both CPU-efficient hashing and best-effort partitioning are combined.