Regeneration and networks of queues
Regeneration and networks of queues
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Estimating the size of generalized transitive closures
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Error-constrained COUNT query evaluation in relational databases
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Probabilistic methods in query processing
Probabilistic methods in query processing
Query size estimation by adaptive sampling (extended abstract)
PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
On the relative cost of sampling for join selectivity estimation
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Modeling high-dimensional index structures using sampling
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database research at the Indian Institute of Technology, Bombay
ACM SIGMOD Record
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Containment join size estimation: models and methods
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Query Size Estimation for Joins Using Systematic Sampling
Distributed and Parallel Databases
Relational confidence bounds are easy with the bootstrap
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Supporting statistical operations in extensible databases: a case study
SSDBM'1994 Proceedings of the 7th international conference on Scientific and Statistical Database Management
Generating targeted queries for database testing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Stretch 'n' shrink: resizing queries to user preferences
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The design of a query monitoring system
ACM Transactions on Database Systems (TODS)
A sampling approach for XML query selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
The VC-dimension of SQL queries and selectivity estimation through sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
CS2: a new database synopsis for query estimation
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
We compare the performance of sampling-based procedures for estimation of the selectivity of an equijoin. While some of the procedures have been proposed in the database sampling literature, their relative performance has never been analyzed. A main result of this paper is a partial ordering that compares the variability of the estimators for the different procedures after an arbitrary fixed number of sampling steps. Prior to the current work, it was also unknown whether these fixed-step estimation procedures can be extended to asymptotically efficient fixed-precision estimation procedures. Our second main result is a general method for such an extension and a proof that the method is valid for all the estimation procedures under consideration. Finally, we show that, under reasonable assumptions on sampling costs, the partial ordering on the variability of the fixed-step estimation procedures implies a partial ordering on the cost of the corresponding fixed-precision estimation procedures. These results lead to a new algorithm for fixed-precision estimation of the selectivity of an equijoin. The algorithm appears to be the best available when there are no indices on the join key. Our results can be extended to general select-join queries.