Efficient selectivity estimation by histogram construction based on subspace clustering

Authors:
Andranik Khachatryan;Emmanuel Müller;Klemens Böhm;Jonida Kopper
Affiliations:
Institute for Program Structures and Data Organization, Karlsruhe Institute of Technology, Germany;Institute for Program Structures and Data Organization, Karlsruhe Institute of Technology, Germany;Institute for Program Structures and Data Organization, Karlsruhe Institute of Technology, Germany;Institute for Program Structures and Data Organization, Karlsruhe Institute of Technology, Germany
Venue:
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Year:
2011

Citing 23
Cited 1

Bounded boxes, Hausdorff distance, and a new proof of an interesting Helly-type theorem

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Clustering Validity Assessment: Finding the Optimal Partitioning of a Data Set

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Frequent-Pattern based Iterative Projected Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
SCHISM: A New Approach for Interesting Subspace Mining

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
ISOMER: Consistent Histogram Construction Using Query Feedback

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Compressed histograms with arbitrary bucket layouts for selectivity estimation

Information Sciences: an International Journal
Selectivity estimation by batch-query based histogram and parametric method

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
DUSC: Dimensionality Unbiased Subspace Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
Relevant Subspace Clustering: Mining the Most Interesting Non-redundant Concepts in High Dimensional Data

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Evaluating clustering in subspace projections of high dimensional data

Proceedings of the VLDB Endowment

Sensitivity of self-tuning histograms: query order affecting accuracy and robustness

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern databases have to cope with multi-dimensional queries. For efficient processing of these queries, query optimization relies on multi-dimensional selectivity estimation techniques. These techniques in turn typically rely on histograms. A core challenge of histogram construction is the detection of regions with a density higher than the ones of their surroundings. In this paper, we show that subspace clustering algorithms, which detect such regions, can be used to build high quality histograms in multi-dimensional spaces. The clusters are transformed into a memory-efficient histogram representation, while preserving most of the information for the selectivity estimation. We derive a formal criterion for our transformation of clusters into buckets that minimizes the introduced estimation error. In practice, finding optimal buckets is hard, so we propose a heuristic. Our experiments show that our approach is efficient in terms of both runtime and memory usage. Overall, we demonstrate that subspace clustering enables multi-dimensional selectivity estimation with low estimation errors.