Statistical profile estimation in database systems
ACM Computing Surveys (CSUR)
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Estimating the size of generalized transitive closures
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Statistical estimators for aggregate relational algebra queries
ACM Transactions on Database Systems (TODS)
Error-constrained COUNT query evaluation in relational databases
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
A Contingency Approach to Estimating Record Selectivities
IEEE Transactions on Software Engineering
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Query size estimation by adaptive sampling (extended abstract)
PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Computation of partial query results with an adaptive stratified sampling technique
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
A Hybrid Estimator for Selectivity Estimation
IEEE Transactions on Knowledge and Data Engineering
An integrated method for estimating selectivities in a multidatabase system
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Query cost estimation through remote system contention states analysis over the Internet
Web Intelligence and Agent Systems
Query evaluation and optimization in the semantic web
Theory and Practice of Logic Programming
ROS: run-time optimization of SPARQL queries
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
KNN based evolutionary techniques for updating query cost models
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Hi-index | 0.00 |
Sampling-based methods for estimating relation sizes after relational operators such as selections, joins and projections have been intensively studied in recent years. Methods of this type can achieve high estimation accuracy and efficiency. Since the dominating overhead involved in a sampling-based method is the sampling cost, different variants of sampling methods are proposed so as to minimize the sampling percentage (thus reducing the sampling cost) while maintaining the estimation accuracy in terms of the confidence level and relative error (to be precisely defined later in Section 2). In order to determine the minimal sampling percentage, the overall characteristics of the data such as the mean and variance are needed. Currently, the representative sampling-based methods in literature are based on the assumption that overall characteristics of data are unavailable, and thus a significant amount of effort is dedicated to estimating these characteristics so as to approach the optimal (minimal) sampling percentage. The estimation for these characteristics incurs cost as well as suffers the estimation error. In this short essay, we point out that the exact values of these characteristics of data can be kept track of in a database system at a negligible overhead. As a result, the minimal sampling percentage while ensuring the specified relative error and confidence level can be precisely determined.