Block Oriented Processing of Relational Database Operations in Modern Computer Architectures
Proceedings of the 17th International Conference on Data Engineering
User-Defined Table Operators: Enhancing Extensibility for ORDBMS
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
Buffering databse operations for enhanced instruction cache performance
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Scientific data management in the coming decade
ACM SIGMOD Record
The CQL continuous query language: semantic foundations and query execution
The VLDB Journal — The International Journal on Very Large Data Bases
Vector and matrix operations programmed with UDFs in a relational DBMS
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Clustera: an integrated computation and data management system
Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Exploiting the power of relational databases for efficient stream processing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Extend UDF Technology for Integrated Analytics
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Efficiently support MapReduce-like computation models inside parallel DBMS
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Experience in extending query engine for continuous analytics
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
GPU-accelerated predicate evaluation on column store
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Generalized UDF for analytics inside database engine
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Experience in Continuous analytics as a Service (CaaaS)
Proceedings of the 14th International Conference on Extending Database Technology
Hi-index | 0.00 |
To achieve scalable data intensive analytics, we investigate methods to integrate general purpose analytic computation into a query pipeline using User Defined Functions (UDFs). However, an existing UDF cannot act as a block operator with chunk-wise input along the tuple-wise query processing pipeline, therefore unable to deal with the application semantics definable on the set of incoming tuples representing a single object or falling in a time window, and unable to leverage external computation engines for efficient batch processing. To enable the data intensive computation pipeline, we introduce a new kind of UDFs called Set-In Set-Out (SISO) UDFs. A SISO UDF is a block operator for processing the input tuples and returning the resulting tuples chunk by chunk. Operated in the query processing pipeline, a SISO UDF pools a chunk of input tuples, dispatches them to GPUs or an analytic engine in batch, materializes and then streams out the results. This behavior differentiates SISO UDF from all the existing ones, and makes efficient integration of analytic computation and data management feasible. We have implemented the SISO UDF framework by extending the PostgreSQL query engine, and further demonstrated the use of SISO UDF with GPU-enabled analytical query evaluation. Our experiments show that the proposed approach is scalable and efficient.