Communications of the ACM - Special issue on parallelism
A Parallel Hash Join Algorithm for Managing Data Skew
IEEE Transactions on Parallel and Distributed Systems
AlphaSort: a RISC machine sort
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Quickly generating billion-record synthetic databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Randomized algorithms
Implementing database operations using SIMD instructions
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
IEEE Transactions on Knowledge and Data Engineering
Optimizing Main-Memory Join on Modern Hardware
IEEE Transactions on Knowledge and Data Engineering
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
A Parallel Sort Merge Join Algorithm for Managing Data Skew
IEEE Transactions on Parallel and Distributed Systems
A Study of Index Structures for Main Memory Database Management Systems
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Database Architecture Optimized for the New Bottleneck: Memory Access
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Hash-Based Join Algorithms for Multiprocessor Computers
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
What Happens During a Join? Dissecting CPU and Memory Optimization Effects
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Adaptive Hash Join Algorithm for Multiuser Environments
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Photon mapping on programmable graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
GPUTeraSort: high performance graphics co-processor sorting for large database management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Multiprocessor hash-based join algorithms
VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
Adaptive aggregation on chip multiprocessors
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Executing stream joins on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Atomic Vector Operations on Chip Multiprocessors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Efficient implementation of sorting on multi-core SIMD CPU architecture
Proceedings of the VLDB Endowment
Data partitioning on chip multiprocessors
Proceedings of the 4th international workshop on Data management on new hardware
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Storage and access in relational data bases
IBM Systems Journal
GPU-ABiSort: optimal parallel sorting on stream architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Cache-Conscious collision resolution in string hash tables
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Parallel skyline computation on multicore architectures
Information Systems
Design and evaluation of main memory hash join algorithms for multi-core CPUs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Designing fast architecture-sensitive tree search on modern multicore/many-core processors
ACM Transactions on Database Systems (TODS)
Fast updates on read-optimized databases using multi-core CPUs
Proceedings of the VLDB Endowment
SharedDB: killing one thousand queries with one stone
Proceedings of the VLDB Endowment
MCJoin: a memory-constrained join for column-store main-memory databases
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only)
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Massively parallel sort-merge joins in main memory multi-core database systems
Proceedings of the VLDB Endowment
Efficient frequent item counting in multi-core hardware
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel processing for stepwise generalisation method on multi-core PC cluster
International Journal of Knowledge and Web Intelligence
Vector Extensions for Decision Support DBMS Acceleration
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Navigating big data with high-throughput, energy-efficient data partitioning
Proceedings of the 40th Annual International Symposium on Computer Architecture
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
Memory footprint matters: efficient equi-join algorithms for main memory data processing
Proceedings of the 4th annual Symposium on Cloud Computing
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture
Proceedings of the VLDB Endowment
Adaptive and big data scale parallel execution in oracle
Proceedings of the VLDB Endowment
Design and evaluation of storage organizations for read-optimized main memory databases
Proceedings of the VLDB Endowment
OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures
Proceedings of the VLDB Endowment
Hardware-oblivious parallelism for in-memory column-stores
Proceedings of the VLDB Endowment
Meet the walkers: accelerating index traversals for in-memory databases
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hardware acceleration of database operations
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
Proceedings of the VLDB Endowment
Skew strikes back: new developments in the theory of join algorithms
ACM SIGMOD Record
Hi-index | 0.00 |
Join is an important database operation. As computer architectures evolve, the best join algorithm may change hand. This paper re-examines two popular join algorithms -- hash join and sort-merge join -- to determine if the latest computer architecture trends shift the tide that has favored hash join for many years. For a fair comparison, we implemented the most optimized parallel version of both algorithms on the latest Intel Core i7 platform. Both implementations scale well with the number of cores in the system and take advantages of latest processor features for performance. Our hash-based implementation achieves more than 100M tuples per second which is 17X faster than the best reported performance on CPUs and 8X faster than that reported for GPUs. Moreover, the performance of our hash join implementation is consistent over a wide range of input data sizes from 64K to 128M tuples and is not affected by data skew. We compare this implementation to our highly optimized sort-based implementation that achieves 47M to 80M tuples per second. We developed analytical models to study how both algorithms would scale with upcoming processor architecture trends. Our analysis projects that current architectural trends of wider SIMD, more cores, and smaller memory bandwidth per core imply better scalability potential for sort-merge join. Consequently, sort-merge join is likely to outperform hash join on upcoming chip multiprocessors. In summary, we offer multicore implementations of hash join and sort-merge join which consistently outperform all previously reported results. We further conclude that the tide that favors the hash join algorithm has not changed yet, but the change is just around the corner.