Range Selectivity Estimation for Continuous Attributes

Authors:
Flip Korn;Theodore Johnson;H. V. Jagadish
Affiliations:
-;-;-
Venue:
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Year:
1999

Citing 0
Cited 14

Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Modeling high-dimensional index structures using sampling

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Value Range Queries on Earth Science Data via Histogram Clustering

TSDM '00 Proceedings of the First International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
Selectivity estimators for multidimensional range queries over real attributes

The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical binary histograms for summarizing multi-dimensional data

Proceedings of the 2005 ACM symposium on Applied computing
Selectivity estimation of range queries based on data density approximation via cosine series

Data & Knowledge Engineering
A genetic approach for efficient outlier detection in projected space

Pattern Recognition
Compressed hierarchical binary histograms for summarizing multi-dimensional data

Knowledge and Information Systems
Synopses for probabilistic data over large domains

Proceedings of the 14th International Conference on Extending Database Technology
Quantile-Parameterized Distributions

Decision Analysis
Exploiting cluster analysis for constructing multi-dimensional histograms on both static and evolving data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Clustering-based histograms for multi-dimensional data

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many commercial database systems maintain histograms to efficiently estimate query selectivities as part of query optimization. Most work on histogram design is implicitly geared towards discrete or categorical attribute value domains. In this paper, we consider approaches that are better suited for the continuous valued attributes commonly found in scientific and statistical databases. We propose two methods based on spline functions for estimating the selectivity of range queries over univariate and multi-variate data.These methods are more accurate than histograms. As the results from our experiments on both real and synthetic data sets demonstrate, the proposed methods achieved substantially better (up to 5.5 times) estimation error than the state-of-the-art histograms, at exactly the same storage space and with comparable CPU runtime overhead; moreover, the superiority of the proposed spline methods is amplified when applied to multivariate data.