ACM Computing Surveys (CSUR)
Statistical profile estimation in database systems
ACM Computing Surveys (CSUR)
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Estimating the size of relational SP J operation results: an analytical approach
Information Systems
Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Implementing a random number package with splitting facilities
ACM Transactions on Mathematical Software (TOMS)
Statistical estimators for aggregate relational algebra queries
ACM Transactions on Database Systems (TODS)
Error-constrained COUNT query evaluation in relational databases
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Implications of certain assumptions in database performance evauation
ACM Transactions on Database Systems (TODS)
Query Optimization in Database Systems
ACM Computing Surveys (CSUR)
Approximating block accesses in database organizations
Communications of the ACM
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Estimating block transfers and join sizes
SIGMOD '83 Proceedings of the 1983 ACM SIGMOD international conference on Management of data
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
An Analytic Approach to Statistical Databases
VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Four Types of Data Skew and Their Effect on Parallel Join Performance
Four Types of Data Skew and Their Effect on Parallel Join Performance
The optimization of queries in relational databases
The optimization of queries in relational databases
Adaptive selectivity estimation using query feedback
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Multi-dimensional selectivity estimation using compressed histogram information
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On approximating rectangle tiling and packing
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Effective Query Size Estimation Using Neural Networks
Applied Intelligence
Fuzzy Statistics Estimation in Supporting Multidatabase Query Optimization
Electronic Commerce Research
On Modeling Cost Functions for Object-Oriented Databases
IEEE Transactions on Knowledge and Data Engineering
A Hybrid Estimator for Selectivity Estimation
IEEE Transactions on Knowledge and Data Engineering
Automatic tuning of data synopses
Information Systems - Special issue: Best papers from EDBT 2002
A Framework for the Physical Design Problem for Data Synopses
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Selectivity Estimation in Extensible Databases - A Neural Network Approach
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Approximate Query Answering In Numerical Databases
SSDBM '96 Proceedings of the Eighth International Conference on Scientific and Statistical Database Management
Join algorithm costs revisited
The VLDB Journal — The International Journal on Very Large Data Bases
A learning-based approach to estimate statistics of operators in continuous queries: a case study
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Interchanging group-by and join in distributed query processing
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Query Size Estimation for Joins Using Systematic Sampling
Distributed and Parallel Databases
Estimating query result sizes for proxy caching in scientific database federations
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Selectivity estimation by batch-query based histogram and parametric method
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Analytic-based estimation of query result sizes
AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
Information Sciences: an International Journal
A grid-based infrastructure for distributed retrieval
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
This paper proposes a novel strategy for estimating the size of the resulting relation after an equi-join and selection using a regression model. An approximating series representing the underlying data distribution and dependency is derived from the actual data. The proposed method provides an instant and accurate size estimation by performing an evaluation of the series, with no run-time overheads in page faults and space, and with negligible CPU overhead. In contrast, the popular sampling methods incur run-time overheads in page faults (for sampling), CPU time and space. These overheads of sampling methods increase the response time of processing a query. The results of a comprehensive experimental study are also reported, which demonstrate that the estimation accuracy by the proposed method is comparable with that of the sampling methods which are believed to provide the most accurate estimation. The proposed method seems ideal for retrieval-intensive database and information systems. Since the overheads involved in deriving the approximating series are fairly moderate, we believe that this method is also an extremely competent method when moderate or periodical updates are present.