Nested relation based database knowledge representation
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
A Teradata content-based multimedia object manager for massively parallel architectures
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Volcano An Extensible and Parallel Query Evaluation System
IEEE Transactions on Knowledge and Data Engineering
Block Oriented Processing of Relational Database Operations in Modern Computer Architectures
Proceedings of the 17th International Conference on Data Engineering
Plan-Per-Tuple Optimization Solution - Parallel Execution of Expensive User-Defined Functions
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
User-Defined Table Operators: Enhancing Extensibility for ORDBMS
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A Transactional Model for Long-Running Activities
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Inter-Enterprise Collaborative Business Process Management
Proceedings of the 17th International Conference on Data Engineering
Buffering databse operations for enhanced instruction cache performance
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Scientific data management in the coming decade
ACM SIGMOD Record
Experiences with MapReduce, an abstraction for large-scale computation
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Vector and matrix operations programmed with UDFs in a relational DBMS
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Clustera: an integrated computation and data management system
Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform
Proceedings of the VLDB Endowment
Generalized UDF for analytics inside database engine
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Scale out parallel and distributed CDR stream analytics
Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
Integrating MapReduce and RDBMSs
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Data stream analytics as cloud service for mobile applications
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Continuous mapreduce for In-DB stream analytics
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Experience in Continuous analytics as a Service (CaaaS)
Proceedings of the 14th International Conference on Extending Database Technology
Column-oriented storage techniques for MapReduce
Proceedings of the VLDB Endowment
Extend core UDF framework for GPU-enabled analytical query evaluation
Proceedings of the 15th Symposium on International Database Engineering & Applications
Hi-index | 0.00 |
While parallel DBMSs do support large scale parallel query processing on partitioned data, the reach of more general applications relies on User Defined Functions (UDFs). However, the existent UDF technology is insufficient both conceptually and practically. A UDF is not a relation-in, relation-out operator, which restricts its ability to model complex applications defined on a set of tuples rather than on a single one, and to be composed with other relational operators in a query. Further, to interact with the query execution efficiently, a UDF must be coded with complex interactions with DBMS internal data structures and system calls which is often beyond the expertise of an analytics application developer. To solve these problems, we start with wrapping general applications with Relation Valued Functions (RVFs); then based on the notion of invocation patterns, we provide focused system support for efficiently integrating RVF execution into the query processing pipeline. We further distinguish the system responsibility and the user responsibility in RVF development, by separating an RVF into the RVF-Shell for dealing with system interaction, and the user-function for pure application logic, such that the RVF-Shell can be constructed in terms of high-level APIs. These mechanisms enable us to solve the essential problems in supporting MapReduce and other analytics computation models inside a parallel database engine: modeling complex applications, integrating them into query processing, and shielding analytics developers from DBMS internal details. Prototyped on a commercial and proprietary parallel database engine, our experience reveals the practical value of the proposed approaches.