Approximating multi-dimensional aggregate range queries over real attributes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Modeling high-dimensional index structures using sampling
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Value Range Queries on Earth Science Data via Histogram Clustering
TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
Selectivity estimators for multidimensional range queries over real attributes
The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical binary histograms for summarizing multi-dimensional data
Proceedings of the 2005 ACM symposium on Applied computing
Selectivity estimation of range queries based on data density approximation via cosine series
Data & Knowledge Engineering
A genetic approach for efficient outlier detection in projected space
Pattern Recognition
Compressed hierarchical binary histograms for summarizing multi-dimensional data
Knowledge and Information Systems
Synopses for probabilistic data over large domains
Proceedings of the 14th International Conference on Extending Database Technology
Quantile-Parameterized Distributions
Decision Analysis
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Clustering-based histograms for multi-dimensional data
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Hi-index | 0.00 |
Many commercial database systems maintain histograms to efficiently estimate query selectivities as part of query optimization. Most work on histogram design is implicitly geared towards discrete or categorical attribute value domains. In this paper, we consider approaches that are better suited for the continuous valued attributes commonly found in scientific and statistical databases. We propose two methods based on spline functions for estimating the selectivity of range queries over univariate and multi-variate data.These methods are more accurate than histograms. As the results from our experiments on both real and synthetic data sets demonstrate, the proposed methods achieved substantially better (up to 5.5 times) estimation error than the state-of-the-art histograms, at exactly the same storage space and with comparable CPU runtime overhead; moreover, the superiority of the proposed spline methods is amplified when applied to multivariate data.