Extending a database system with procedures
ACM Transactions on Database Systems (TODS)
Extensible database management systems
ACM SIGMOD Record - Directions for future database research & development
The POSTGRES next generation database management system
Communications of the ACM
Parallel database systems: the future of high performance database systems
Communications of the ACM
Predicate migration: optimizing queries with expensive predicates
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Query execution techniques for caching expensive methods
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On parallel processing of aggregate and scalar functions in object-relational DBMS
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of queries with user-defined predicates
ACM Transactions on Database Systems (TODS)
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The Implementation of POSTGRES
IEEE Transactions on Knowledge and Data Engineering
Inclusion of New Types in Relational Data Base Systems
Proceedings of the Second International Conference on Data Engineering
User-Defined Table Operators: Enhancing Extensibility for ORDBMS
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Integrating MapReduce and RDBMSs
Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Query optimization techniques for partitioned tables
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient processing of data warehousing queries in a split execution environment
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
ETLMR: a highly scalable dimensional ETL framework based on mapreduce
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Tagged mapreduce: efficiently computing multi-analytics using mapreduce
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Query optimization for massively parallel data processing
Proceedings of the 2nd ACM Symposium on Cloud Computing
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
GLADE: a scalable framework for efficient analytics
ACM SIGOPS Operating Systems Review
Oracle in-database hadoop: when mapreduce meets RDBMS
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Adaptive optimizations of recursive queries in teradata
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Opening the black boxes in data flow optimization
Proceedings of the VLDB Endowment
Iterative parallel data processing with stratosphere: an inside look
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
BigBench: towards an industry standard benchmark for big data analytics
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Large-scale social-media analytics on stratosphere
Proceedings of the 22nd international conference on World Wide Web companion
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Can we analyze big data inside a DBMS?
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
A user-defined function (UDF) is a powerful database feature that allows users to customize database functionality. Though useful, present UDFs have numerous limitations, including install-time specification of input and output schema and poor ability to parallelize execution. We present a new approach to implementing a UDF, which we call SQL/MapReduce (SQL/MR), that overcomes many of these limitations. We leverage ideas from the MapReduce programming paradigm to provide users with a straightforward API through which they can implement a UDF in the language of their choice. Moreover, our approach allows maximum flexibility as the output schema of the UDF is specified by the function itself at query plan-time. This means that a SQL/MR function is polymorphic. It can process arbitrary input because its behavior as well as output schema are dynamically determined by information available at query plan-time, such as the function's input schema and arbitrary user-provided parameters. This also increases reusability as the same SQL/MR function can be used on inputs with many different schemas or with different user-specified parameters. In this paper we describe the motivation for this new approach to UDFs as well as the implementation within Aster Data Systems' nCluster database. We demonstrate that in the context of massively parallel, shared-nothing database systems, this model of computation facilitates highly scalable computation within the database. We also include examples of new applications that take advantage of this novel UDF framework.