Selectivity Estimation for Joins Using Systematic Sampling

Authors:
Banchong Harangsri;John Shepherd;Anne Ngu
Affiliations:
-;-;-
Venue:
DEXA '97 Proceedings of the 8th International Workshop on Database and Expert Systems Applications
Year:
1997

Citing 0
Cited 4

Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Distance-join: pattern match query in a large graph database

Proceedings of the VLDB Endowment
IRSJ: incremental refining spatial joins for interactive queries in GIS

Geoinformatica
Answering pattern match queries in large graph databases via graph embedding

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new approach to the estimation of join selectivity. The technique, which we have called ``systematic sampling'', is a novel variant of the sampling-based approach. Systematic sampling works as follows: Given a relation R of N tuples, with a join attribute that can be accessed in ascending/descending order via an index, if n is the number of tuples to be sampled from R, select a tuple at random from the first k = \lceil\frac{N}{n}\rceil tuples of R and every kth tuple thereafter.We first develop a theoretical foundation for systematic sampling which suggests that the method gives a more representative sample than the traditional simple random sampling. Subsequent experimental analysis on a range of synthetic relations confirms that the quality of sample relations (participating in a join) yielded by systematic sampling is higher than those produced by the traditional simple random sampling.To ensure that the sample relations produced by the systematic sampling indeed assist in computation for more accurate join selectivities, we compare the systematic sampling with the most efficient simple random sampling called t\_cross using a variety of star joins and a variety of relation configurations. The results demonstrate that with the same amount of sampling, the systematic sampling can provide considerably more accurate join selectivities than the t\_cross sampling.