SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems
Communications of the ACM
On optimal processor allocation to support pipelined hash joins
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Database Architecture Optimized for the New Bottleneck: Memory Access
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Happens During a Join? Dissecting CPU and Memory Optimization Effects
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Accelerating database operators using a network processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Improving hash join performance through prefetching
ACM Transactions on Database Systems (TODS)
Multiprocessor hash-based join algorithms
VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
Generic database cost models for hierarchical memory systems
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Pipelined hash-join on multithreaded architectures
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Cache-oblivious databases: Limitations and opportunities
ACM Transactions on Database Systems (TODS)
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Database compression on graphics processors
Proceedings of the VLDB Endowment
High-throughput transaction executions on graphics processors
Proceedings of the VLDB Endowment
Design and evaluation of main memory hash join algorithms for multi-core CPUs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
How soccer players would do stream joins
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fast implementation of DGEMM on Fermi GPU
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A Comprehensive Performance Comparison of CUDA and OpenCL
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems
ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Accelerating MapReduce on a coupled CPU-GPU architecture
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware
ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
Query co-processing on graphics processors (GPUs) has become an effective means to improve the performance of main memory databases. However, the relatively low bandwidth and high latency of the PCI-e bus are usually bottleneck issues for co-processing. Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the GPU integrated into a single chip. That opens up new opportunities for optimizing query co-processing. In this paper, we experimentally revisit hash joins, one of the most important join algorithms for main memory databases, on a coupled CPU-GPU architecture. Particularly, we study the fine-grained co-processing mechanisms on hash joins with and without partitioning. The co-processing outlines an interesting design space. We extend existing cost models to automatically guide decisions on the design space. Our experimental results on a recent AMD APU show that (1) the coupled architecture enables fine-grained co-processing and cache reuses, which are inefficient on discrete CPU-GPU architectures; (2) the cost model can automatically guide the design and tuning knobs in the design space; (3) fine-grained co-processing achieves up to 53%, 35% and 28% performance improvement over CPU-only, GPU-only and conventional CPU-GPU co-processing, respectively. We believe that the insights and implications from this study are initial yet important for further research on query co-processing on coupled CPU-GPU architectures.