Efficient sampling strategies for relational database operations
ICDT Selected papers of the 4th international conference on Database theory
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Rk-hist: an r-tree based histogram for multi-dimensional selectivity estimation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Depth estimation for ranking query optimization
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Linked Bernoulli Synopses: Sampling along Foreign Keys
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
TuG synopses for approximate query answering
ACM Transactions on Database Systems (TODS)
Depth estimation for ranking query optimization
The VLDB Journal — The International Journal on Very Large Data Bases
Deriving predicate statistics in datalog
Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
Faster query answering in probabilistic databases using read-once functions
Proceedings of the 14th International Conference on Database Theory
Regression on evolving multi-relational data streams
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Optimizing incremental maintenance of minimal bisimulation of cyclic graphs
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Deriving predicate statistics for logic rules
RR'12 Proceedings of the 6th international conference on Web Reasoning and Rule Systems
Efficiently adapting graphical models for selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Selectivity estimation for hybrid queries over text-rich data graphs
Proceedings of the 16th International Conference on Extending Database Technology
CS2: a new database synopsis for query estimation
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
xPAD: a platform for analytic data flows
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
This paper introduces the Tuple Graph (TUG) synopses, a new class of data summaries that enable accurate selectivity estimates for complex relational queries. The proposed summarization framework adopts a "semi-structured" view of the relational database, modeling a relational data set as a graph of tuples and join queries as graph traversals respectively. The key idea is to approximate the structure of the induced data graph in a concise synopsis, and to estimate the selectivity of a query by performing the corresponding traversal over the summarized graph. We detail the TUG synopsis model that is based on this novel approach, and we describe an efficient and scalable construction algorithm for building accurate TUGs within a specific storage budget. We validate the performance of TUGs with an extensive experimental study on real-life and synthetic data sets. Our results verify the effectiveness of TUGs in generating accurate selectivity estimates for complex join queries, and demonstrate their benefits over existing summarization techniques.