Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Instance-based prediction of real-valued attributes
Computational Intelligence
Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Instance-Based Learning Algorithms
Machine Learning
On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Error-constrained COUNT query evaluation in relational databases
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Fixed-precision estimation of join selectivity
PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
On the estimation of join result sizes
EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Adaptive selectivity estimation using query feedback
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
ACM Computing Surveys (CSUR)
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
An Evaluation of Sampling-Based Size Estimation Methods for Selections in Database Systems
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Query Size Estimation Using Machine Learning
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological evaluations
An integrated method for estimating selectivities in a multidatabase system
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
The VC-dimension of SQL queries and selectivity estimation through sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Hi-index | 0.01 |
We propose a new approach to the estimation of query result sizes for join queries. The technique, which we have called “systematic sampling—SYSSMP”, is a novel variant of the sampling-based approach. A key novelty of the systematic sampling is that it exploits the sortedness of data; the result of this is that the sample relation obtained well represents the underlying frequency distribution of the join attribute in the original relation.We first develop a theoretical foundation for systematic sampling which suggests that the method gives a more representative sample than the traditional simple random sampling. Subsequent experimental analysis on a range of synthetic relations confirms that the quality of sample relations yielded by systematic sampling is higher than those produced by the traditional simple random sampling.To ensure that sample relations produced by systematic sampling indeed assist in computing more accurate query result sizes, we compare systematic sampling with the most efficient simple random sampling called t_cross using a variety of relation configurations. The results obtained validate that systematic sampling uses the same amount of sampling but still provides more accurate query result sizes than t_cross. Furthermore, the extra sampling cost incurred by the use of systematic sampling pays off in a cheaper query execution cost at run-time.