Optimization techniques for queries with expensive methods
ACM Transactions on Database Systems (TODS)
Selection conditions in main memory
ACM Transactions on Database Systems (TODS)
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A control-structure splitting optimization for GPGPU
Proceedings of the 6th ACM conference on Computing frontiers
Accelerating SQL database operations on a GPU with CUDA
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Proceedings of the 24th ACM International Conference on Supercomputing
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
Software-based branch predication for AMD GPUs
ACM SIGARCH Computer Architecture News
On-the-fly elimination of dynamic irregularities for GPU computing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Reducing branch divergence in GPU programs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Efficiently compiling efficient query plans for modern hardware
Proceedings of the VLDB Endowment
SIMD re-convergence at thread frontiers
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Ameliorating memory contention of OLAP operators on GPU processors
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Implementations of data processing operators on GPU processors have achieved significant performance improvements over their multicore CPU counterparts. To achieve maximum performance, database operator implementations must take into consideration special features of GPU architectures. A crucial difference is that the unit of execution is a group ("warp") of threads, 32 threads in our target architecture, as opposed to a single thread for CPUs. In the presence of branches, threads in a warp have to follow the same execution path; if some threads diverge then different paths are serialized. Additionally, similarly to CPUs, branches degrade the efficiency of instruction scheduling. Here, we study conjunctive selection queries where branching hurts performance. We compute the optimal execution plan for a conjunctive query, taking branch penalties into account and consider both single-kernel and multi-kernel plans. Our evaluation suggests that divergence affects performance significantly and that our techniques reduce resource underutilization and improve operator performance.