Self-tuning cost modeling of user-defined functions in an object-relational DBMS

Authors:
Zhen He;Byung Suk Lee;Robert Snapp
Affiliations:
La Trobe University, Australia;University of Vermont, Burlington VT;University of Vermont, Burlington VT
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2005

Citing 28
Cited 3

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The hB-tree: a multiattribute indexing method with good guaranteed performance

ACM Transactions on Database Systems (TODS)
Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Practical predicate placement

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Optimization techniques for queries with expensive methods

ACM Transactions on Database Systems (TODS)
Enhanced nearest neighbour search on the R-tree

ACM SIGMOD Record
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Optimization of queries with user-defined predicates

ACM Transactions on Database Systems (TODS)
Cost estimation of user-defined methods in object-relational database systems

ACM SIGMOD Record
Towards self-tuning data placement in parallel database systems

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A performance comparison of quadtree-based access methods for thematic maps

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 1
Optimizing multidimensional index trees for main memory access

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Progressive approximate aggregate queries with a multi-resolution tree structure

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A Hybrid Pointerless Representation of Quadtrees for Efficient Processing of Window Queries

IGIS '94 Proceedings of the International Workshop on Advanced Information Systems: Geographic Information Systems
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The hB $^\Pi$-tree: a multi-attribute index supporting concurrency, recovery and node consolidation

The VLDB Journal — The International Journal on Very Large Data Bases
Contorting high dimensional data for efficient main memory KNN processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fast multidimensional scaling through sampling, springs and interpolation

Information Visualization
Evolutionary techniques for updating query cost models in a dynamic multidatabase environment

The VLDB Journal — The International Journal on Very Large Data Bases
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A quad-tree based multiresolution approach for two-dimensional summary data

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management

Query optimizer for spatial join operations

GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling

Data & Knowledge Engineering
Optimizing queries with expensive video predicates in cloud environment

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query optimizers in object-relational database management systems typically require users to provide the execution cost models of user-defined functions (UDFs). Despite this need, however, there has been little work done to provide such a model. The existing approaches are static in that they require users to train the model a priori with pregenerated UDF execution cost data. Static approaches can not adapt to changing UDF execution patterns and thus degrade in accuracy when the UDF executions used for generating training data do not reflect the patterns of those performed during operation. This article proposes a new approach based on the recent trend of self-tuning DBMS by which the cost model is maintained dynamically and incrementally as UDFs are being executed online. In the context of UDF cost modeling, our approach faces a number of challenges, that is, it should work with limited memory, work with limited computation time, and adjust to the fluctuations in the execution costs (e.g., caching effect). In this article, we first provide a set of guidelines for developing techniques that meet these challenges, while achieving accurate and fast cost prediction with small overheads. Then, we present two concrete techniques developed under the guidelines. One is an instance-based technique based on the conventional k-nearest neighbor (KNN) technique which uses a multidimensional index like the R*-tree. The other is a summary-based technique which uses the quadtree to store summary values at multiple resolutions. We have performed extensive performance evaluations comparing these two techniques against existing histogram-based techniques and the KNN technique, using both real and synthetic UDFs/data sets. The results show our techniques provide better performance in most situations considered.