TuG synopses for approximate query answering

Authors:
Joshua Spiegel;Neoklis Polyzotis
Affiliations:
BEA Systems, Redwood Shores, CA;University of California at Santa Cruz, Santa Cruz, CA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2009

Citing 21
Cited 7

Clique partitions, graph compression and speeding-up algorithms

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Efficient sampling strategies for relational database operations

ICDT Selected papers of the 4th international conference on Database theory
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Bifocal sampling for skew-resistant join size estimation

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate XML query answers

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
XSEED: Accurate and Fast Cardinality Estimation for XPath Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
XCluster Synopses for Structured XML Content

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Graph-based synopses for relational selectivity estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data

Parallel computing for data reduction

AIKED'10 Proceedings of the 9th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
A parallel algorithm to compute data synopsis

WSEAS Transactions on Information Science and Applications
Probabilistic model for accuracy estimation in approximate monodimensional analyses

WSEAS Transactions on Computers
Accuracy estimation in approximate query processing

ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Metadata for approximate query answering systems

Advances in Software Engineering
CS2: a new database synopsis for query estimation

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article introduces the Tuple Graph (TuG) synopses, a new class of data summaries that enable accurate approximate answers for complex relational queries. The proposed summarization framework adopts a “semi-structured” view of the relational database, modeling a relational data set as a graph of tuples and join queries as graph traversals, respectively. The key idea is to approximate the structure of the induced data graph in a concise synopsis, and to approximate the answer to a query by performing the corresponding traversal over the summarized graph. We detail the (TuG) synopsis model that is based on this novel approach, and we describe an efficient and scalable construction algorithm for building accurate (TuG) within a specific storage budget. We validate the performance of (TuG) with an extensive experimental study on real-life and synthetic datasets. Our results verify the effectiveness of (TuG) in generating accurate approximate answers for complex join queries, and demonstrate their benefits over existing summarization techniques.