GPUTeraSort: high performance graphics co-processor sorting for large database management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Fast computation of database operations using graphics processors
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
GPUQP: query co-processing using graphics processors
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Hardware acceleration in commercial databases: a case study of spatial operations
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
hiCUDA: a high-level directive-based language for GPU programming
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Using graphics processors for high performance IR query processing
Proceedings of the 18th international conference on World wide web
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
IEEE Spectrum
Towards personal high-performance geospatial computing (HPC-G): perspectives and a case study
Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Comparing GPU and CPU in OLAP cubes creation
SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
CUDACS: securing the cloud with CUDA-enabled secure virtualization
ICICS'10 Proceedings of the 12th international conference on Information and communications security
Real-time computation of advanced rules in OLAP databases
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Many-Core architecture oriented parallel algorithm design for computer animation
MIG'11 Proceedings of the 4th international conference on Motion in Games
Ameliorating memory contention of OLAP operators on GPU processors
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
X-device query processing by bitwise distribution
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems
Parallel Computing
U2SOD-DB: a database system to manage large-scale ubiquitous urban sensing origin-destination data
Proceedings of the ACM SIGKDD International Workshop on Urban Computing
U2STRA: high-performance data management of ubiquitous urban sensing trajectories on GPGPUs
Proceedings of the 2012 ACM workshop on City data management workshop
Parallel design for error-resilient entropy coding algorithm on GPU
Journal of Parallel and Distributed Computing
Grex: An efficient MapReduce framework for graphics processing units
Journal of Parallel and Distributed Computing
Speeding up large-scale point-in-polygon test based spatial join on GPUs
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Comparison based sorting for systems with multiple GPUs
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Data management systems on GPUs: promises and challenges
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Optimizing select conditions on GPUs
Proceedings of the Ninth International Workshop on Data Management on New Hardware
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
Parallel multi-dimensional range query processing with R-trees on GPU
Journal of Parallel and Distributed Computing
Efficient co-processor utilization in database query processing
Information Systems
Why it is time for a HyPE: a hybrid query processing engine for efficient GPU coprocessing in DBMS
Proceedings of the VLDB Endowment
Rhythm: harnessing data parallel hardware for server workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Red Fox: An Execution Environment for Relational Query Processing on GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on the GPU. This dramatically reduces the effort required to achieve GPU acceleration by avoiding the need for database programmers to use new programming languages such as CUDA or modify their programs to use non-SQL libraries. This paper focuses on accelerating SELECT queries and describes the considerations in an efficient GPU implementation of the SQLite command processor. Results on an NVIDIA Tesla C1060 achieve speedups of 20-70X depending on the size of the result set.