Extend core UDF framework for GPU-enabled analytical query evaluation

Authors:
Qiming Chen;Ren Wu;Meichun Hsu;Bin Zhang
Affiliations:
HP Labs, Palo Alto, CA;HP Labs, Palo Alto, CA;HP Labs, Palo Alto, CA;HP Labs, Palo Alto, CA
Venue:
Proceedings of the 15th Symposium on International Database Engineering & Applications
Year:
2011

Citing 17
Cited 0

Block Oriented Processing of Relational Database Operations in Modern Computer Architectures

Proceedings of the 17th International Conference on Data Engineering
User-Defined Table Operators: Enhancing Extensibility for ORDBMS

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Buffering databse operations for enhanced instruction cache performance

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Scientific data management in the coming decade

ACM SIGMOD Record
The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
Vector and matrix operations programmed with UDFs in a relational DBMS

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Clustera: an integrated computation and data management system

Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Exploiting the power of relational databases for efficient stream processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Extend UDF Technology for Integrated Analytics

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Efficiently support MapReduce-like computation models inside parallel DBMS

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Experience in extending query engine for continuous analytics

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
GPU-accelerated predicate evaluation on column store

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Generalized UDF for analytics inside database engine

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Experience in Continuous analytics as a Service (CaaaS)

Proceedings of the 14th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

To achieve scalable data intensive analytics, we investigate methods to integrate general purpose analytic computation into a query pipeline using User Defined Functions (UDFs). However, an existing UDF cannot act as a block operator with chunk-wise input along the tuple-wise query processing pipeline, therefore unable to deal with the application semantics definable on the set of incoming tuples representing a single object or falling in a time window, and unable to leverage external computation engines for efficient batch processing. To enable the data intensive computation pipeline, we introduce a new kind of UDFs called Set-In Set-Out (SISO) UDFs. A SISO UDF is a block operator for processing the input tuples and returning the resulting tuples chunk by chunk. Operated in the query processing pipeline, a SISO UDF pools a chunk of input tuples, dispatches them to GPUs or an analytic engine in batch, materializes and then streams out the results. This behavior differentiates SISO UDF from all the existing ones, and makes efficient integration of analytic computation and data management feasible. We have implemented the SISO UDF framework by extending the PostgreSQL query engine, and further demonstrated the use of SISO UDF with GPU-enabled analytical query evaluation. Our experiments show that the proposed approach is scalable and efficient.