Parameter variations and impact on circuits and microarchitecture
Proceedings of the 40th annual Design Automation Conference
Fast parallel GPU-sorting using a hybrid algorithm
Journal of Parallel and Distributed Computing
Efficient implementation of sorting on multi-core SIMD CPU architecture
Proceedings of the VLDB Endowment
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Streams on wires: a query compiler for FPGAs
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Glacier: a query-to-hardware compiler
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Communications of the ACM
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Database analytics acceleration using FPGAs
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
As the amount of memory in database systems grows, entire database tables, or even databases, are able to fit in the system's memory, making in-memory database operations more prevalent. This shift from disk-based to in-memory database systems has contributed to a move from row-wise to columnar data storage. Furthermore, common database workloads have grown beyond online transaction processing (OLTP) to include online analytical processing and data mining. These workloads analyze huge datasets that are often irregular and not indexed, making traditional database operations like joins much more expensive. In this paper we explore using dedicated hardware to accelerate in-memory database operations. We present hardware to accelerate the selection process of compacting a single column into a linear column of selected data, joining two sorted columns via merging, and sorting a column. Finally, we put these primitives together to accelerate an entire join operation. We implement a prototype of this system using FPGAs and show substantial improvements in both absolute throughput and utilization of memory bandwidth. Using the prototype as a guide, we explore how the hardware resources required by our design change with the desired throughput.