Join synopses for approximate query answering

Authors:
Swarup Acharya;Phillip B. Gibbons;Viswanath Poosala;Sridhar Ramaswamy
Affiliations:
Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ;Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ;Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ;Information Sciences Research Center, Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ
Venue:
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Year:
1999

Citing 21
Cited 128

Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the relative cost of sampling for join selectivity estimation

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Query size estimation by adaptive sampling

Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bifocal sampling for skew-resistant join size estimation

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Selectivity estimation in spatial databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions

Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
APPROXIMATE: A Query Processor that Produces Monotonically Improving Approximate Answers

IEEE Transactions on Knowledge and Data Engineering
Maintenance of Materialized Views of Sampling Queries

Proceedings of the Eighth International Conference on Data Engineering
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
The optimization of queries in relational databases

The optimization of queries in relational databases

Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Sampling from databases using B+-trees

Proceedings of the ninth international conference on Information and knowledge management
A robust, optimization-based approach for approximate answering of aggregate queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Mining data streams under block evolution

ACM SIGKDD Explorations Newsletter
Loglinear-Based Quasi Cubes

Journal of Intelligent Information Systems
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Wavelet synopses with error guarantees

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fast incremental maintenance of approximate histograms

ACM Transactions on Database Systems (TODS)
Continuous queries over data streams

ACM SIGMOD Record
Compressed data cube for approximate OLAP query processing

Journal of Computer Science and Technology
Automatic tuning of data synopses

Information Systems - Special issue: Best papers from EDBT 2002
A Framework for the Physical Design Problem for Data Synopses

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Aqua: A Fast Decision Support Systems Using Approximate Query Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Using Loglinear Models to Compress Datacube

WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Supporting Online Queries in ROLAP

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Time-Interval Sampling for Improved Estimations in Data Warehouses

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Limiting Result Cardinalities for Multidatabase Queries Using Histograms

BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Approximate query processing using wavelets

The VLDB Journal — The International Journal on Very Large Data Bases
Managing and analyzing massive data sets with data cubes

Handbook of massive data sets
pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
DSQoS-distributed architecture providing QoS in summary warehouses

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
The framework for approximate queries on simulation data

Information Sciences—Informatics and Computer Science: An International Journal
Probabilistic wavelet synopses

ACM Transactions on Database Systems (TODS)
Approximate Selection Queries over Imprecise Data

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Load Shedding for Aggregation Queries over Data Streams

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A bi-level Bernoulli scheme for database sampling

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online maintenance of very large random samples

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Conditional selectivity for statistics on query expressions

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Query sampling in DB2 Universal Database

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Handling big dimensions in distributed data warehouses using the DWS technique

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Deterministic wavelet thresholding for maximum-error metrics

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Synopses for query optimization: a space-complexity perspective

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
XML stream processing using tree-edit distance embeddings

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Proactive re-optimization

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Towards a robust query optimizer: a principled and practical approach

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
RPJ: producing fast join results on streams through rate-based optimization

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
When can we trust progress estimators for SQL queries?

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploring bit-difference for approximate KNN search in high-dimensional databases

ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Using Datacube Aggregates for Approximate Querying and Deviation Detection

IEEE Transactions on Knowledge and Data Engineering
Wavelet synopses for general error metrics

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Synopses for query optimization: A space-complexity perspective

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Improving range-sum query evaluation on data cubes via polynomial approximation

Data & Knowledge Engineering
Sample-Based Quality Estimation of Query Results in Relational Database Environments

IEEE Transactions on Knowledge and Data Engineering
Graph-based synopses for relational selectivity estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Holes in joins

Journal of Intelligent Information Systems
The Sort-Merge-Shrink join

ACM Transactions on Database Systems (TODS)
Random Sampling for Continuous Streams with Arbitrary Updates

IEEE Transactions on Knowledge and Data Engineering
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Evaluation of probabilistic queries over imprecise data in constantly-evolving environments

Information Systems
Approximate range---sum query answering on data cubes with probabilistic guarantees

Journal of Intelligent Information Systems
Optimized stratified sampling for approximate query processing

ACM Transactions on Database Systems (TODS)
Extended wavelets for multiple measures

ACM Transactions on Database Systems (TODS)
Cardinality estimation using sample views with quality assurance

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Sampling from databases using B$^+$-Trees

Intelligent Data Analysis
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Robust estimation with sampling and approximate pre-aggregation

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Depth estimation for ranking query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Supporting time-constrained SQL queries in oracle

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Confidence bounds for sampling-based group by estimates

ACM Transactions on Database Systems (TODS)
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file

The VLDB Journal — The International Journal on Very Large Data Bases
Scalable approximate query processing with the DBO engine

ACM Transactions on Database Systems (TODS)
Linked Bernoulli Synopses: Sampling along Foreign Keys

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
A Probabilistic Approach for Computing Approximate Iceberg Cubes

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
A Segmentation-Based Approach for Approximate Query over Distributed Ontologies

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
TuG synopses for approximate query answering

ACM Transactions on Database Systems (TODS)
Sample synopses for approximate answering of group-by queries

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Finding frequent co-occurring terms in relational keyword search

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Ranking objects based on relationships and fixed associations

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Depth estimation for ranking query optimization

The VLDB Journal — The International Journal on Very Large Data Bases
Query optimizers: time to rethink the contract?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Data reduction for data analysis

ECC'08 Proceedings of the 2nd conference on European computing conference
Efficient and effective RFID data warehousing

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Fast and effective histogram construction

Proceedings of the 18th ACM conference on Information and knowledge management
Statistical structures for Internet-scale data management

The VLDB Journal — The International Journal on Very Large Data Bases
StatAdvisor: recommending statistical views

Proceedings of the VLDB Endowment
Distributed online aggregations

Proceedings of the VLDB Endowment
Optimality and scalability in lattice histogram construction

Proceedings of the VLDB Endowment
An experimental study of time-constrained aggregate queries

Proceedings of the 13th International Conference on Extending Database Technology
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
Parallel computing for data reduction

AIKED'10 Proceedings of the 9th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
Approximating sliding windows by cyclic tree-like histograms for efficient range queries

Data & Knowledge Engineering
A parallel algorithm to compute data synopsis

WSEAS Transactions on Information Science and Applications
Probabilistic model for accuracy estimation in approximate monodimensional analyses

WSEAS Transactions on Computers
Efficient temporal keyword search over versioned text

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A sample advisor for approximate query processing

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
A data-centric approach to insider attack detection in database systems

RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Regression on evolving multi-relational data streams

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Accuracy estimation in approximate query processing

ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Compression aware physical database design

Proceedings of the VLDB Endowment
The VC-dimension of SQL queries and selectivity estimation through sampling

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Estimating selectivity for joined RDF triple patterns

Proceedings of the 20th ACM international conference on Information and knowledge management
Randomized accuracy-aware program transformations for efficient approximate computations

POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms

Information Sciences: an International Journal
Deferred maintenance of disk-based random samples

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Hierarchical group-based sampling

BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
Estimating aggregate join queries over data streams using discrete cosine transform

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
What next?: a half-dozen data management research goals for big data and the cloud

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Histograms as statistical estimators for aggregate queries

Information Systems
Efficiently adapting graphical models for selectivity estimation

The VLDB Journal — The International Journal on Very Large Data Bases
Selectivity estimation for hybrid queries over text-rich data graphs

Proceedings of the 16th International Conference on Extending Database Technology
CS2: a new database synopsis for query estimation

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
BlinkDB: queries with bounded errors and bounded response times on very large data

Proceedings of the 8th ACM European Conference on Computer Systems
Generation of test databases using sampling methods

Proceedings of the 2013 International Symposium on Software Testing and Analysis
A sampling algebra for aggregate estimation

Proceedings of the VLDB Endowment
Adaptive stratified reservoir sampling over heterogeneous data streams

Information Systems
Optimizing Sample Design for Approximate Query Processing

International Journal of Knowledge-Based Organizations

Quantified Score

Hi-index	0.00

Visualization

Abstract

In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate answers for join-queries using only statistics (in particular, samples) from the base relations. We propose join synopses as an effective solution for this problem and show how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins. We present optimal strategies for allocating the available space among the various join synopses when the query work load is known and identify heuristics for the common case when the work load is not known. We also present efficient algorithms for incrementally maintaining join synopses in the presence of updates to the base relations. Our extensive set of experiments on the TPC-D benchmark database show the effectiveness of join synopses and various other techniques proposed in this paper.