Query sampling in DB2 Universal Database

Authors:
Jarek Gryz;Junjie Guo;Linqi Liu;Calisto Zuzarte
Affiliations:
York University;York University;IBM Toronto Lab;IBM Toronto Lab
Venue:
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Year:
2004

Citing 19
Cited 3

Logic-based approach to semantic query optimization

ACM Transactions on Database Systems (TODS)
On estimating the size of projections

ICDT '90 Proceedings of the third international conference on database theory on Database theory
Extensible/rule based query rewrite optimization in Starburst

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
On the relative cost of sampling for join selectivity estimation

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Implementation of magic-sets in a relational database system

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Bifocal sampling for skew-resistant join size estimation

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Online query processing: a tutorial

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Starburst Mid-Flight: As the Dust Clears

IEEE Transactions on Knowledge and Data Engineering
A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Exploiting Uniqueness in Query Optimization

Proceedings of the Tenth International Conference on Data Engineering
Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Including Group-By in Query Optimization

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Query Optimization by Predicate Move-Around

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research

Fast approximate computation of statistics on views

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sampling dirty data for matching attributes

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A sampling algebra for aggregate estimation

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Executing ad hoc queries against large databases can be prohibitively expensive. Exploratory analysis of data may not require exact answers to queries, however: results based on sampling the data are often satisfactory. Supporting sampling as a primitive SQL operator turns out to be difficult because sampling does not commute with many SQL operators.In this paper, we describe an implementation in IBM® DB2® Universal Database (UDB) of a sampling operator that commutes with some SQL operators. As a result, the query with the sampling operator always returns a random sample of the answers and in many cases runs faster than it would have without such an operator.