Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the relative cost of sampling for join selectivity estimation
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Adaptive selectivity estimation using query feedback
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Query size estimation by adaptive sampling
Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems
Histogram-based estimation techniques in database systems
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Selectivity estimation in spatial databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
APPROXIMATE: A Query Processor that Produces Monotonically Improving Approximate Answers
IEEE Transactions on Knowledge and Data Engineering
Maintenance of Materialized Views of Sampling Queries
Proceedings of the Eighth International Conference on Data Engineering
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
The optimization of queries in relational databases
The optimization of queries in relational databases
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Sampling from databases using B+-trees
Proceedings of the ninth international conference on Information and knowledge management
A robust, optimization-based approach for approximate answering of aggregate queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Mining data streams under block evolution
ACM SIGKDD Explorations Newsletter
Journal of Intelligent Information Systems
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Wavelet synopses with error guarantees
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
Continuous queries over data streams
ACM SIGMOD Record
Compressed data cube for approximate OLAP query processing
Journal of Computer Science and Technology
Automatic tuning of data synopses
Information Systems - Special issue: Best papers from EDBT 2002
A Framework for the Physical Design Problem for Data Synopses
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Aqua: A Fast Decision Support Systems Using Approximate Query Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Using Loglinear Models to Compress Datacube
WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
Supporting Online Queries in ROLAP
DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Time-Interval Sampling for Improved Estimations in Data Warehouses
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Limiting Result Cardinalities for Multidatabase Queries Using Histograms
BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Approximate query processing using wavelets
The VLDB Journal — The International Journal on Very Large Data Bases
Managing and analyzing massive data sets with data cubes
Handbook of massive data sets
pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluating probabilistic queries over imprecise data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
DSQoS-distributed architecture providing QoS in summary warehouses
DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
The framework for approximate queries on simulation data
Information Sciences—Informatics and Computer Science: An International Journal
Probabilistic wavelet synopses
ACM Transactions on Database Systems (TODS)
Approximate Selection Queries over Imprecise Data
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Load Shedding for Aggregation Queries over Data Streams
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A bi-level Bernoulli scheme for database sampling
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Conditional selectivity for statistics on query expressions
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Query sampling in DB2 Universal Database
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Handling big dimensions in distributed data warehouses using the DWS technique
Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Deterministic wavelet thresholding for maximum-error metrics
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Synopses for query optimization: a space-complexity perspective
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
XML stream processing using tree-edit distance embeddings
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Towards a robust query optimizer: a principled and practical approach
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
RPJ: producing fast join results on streams through rate-based optimization
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
When can we trust progress estimators for SQL queries?
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploring bit-difference for approximate KNN search in high-dimensional databases
ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Using Datacube Aggregates for Approximate Querying and Deviation Detection
IEEE Transactions on Knowledge and Data Engineering
Wavelet synopses for general error metrics
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Synopses for query optimization: A space-complexity perspective
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Improving range-sum query evaluation on data cubes via polynomial approximation
Data & Knowledge Engineering
Sample-Based Quality Estimation of Query Results in Relational Database Environments
IEEE Transactions on Knowledge and Data Engineering
Graph-based synopses for relational selectivity estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Journal of Intelligent Information Systems
ACM Transactions on Database Systems (TODS)
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Approximate range---sum query answering on data cubes with probabilistic guarantees
Journal of Intelligent Information Systems
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
Extended wavelets for multiple measures
ACM Transactions on Database Systems (TODS)
Cardinality estimation using sample views with quality assurance
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Sampling from databases using B$^+$-Trees
Intelligent Data Analysis
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Robust estimation with sampling and approximate pre-aggregation
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Depth estimation for ranking query optimization
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Supporting time-constrained SQL queries in oracle
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Confidence bounds for sampling-based group by estimates
ACM Transactions on Database Systems (TODS)
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file
The VLDB Journal — The International Journal on Very Large Data Bases
Scalable approximate query processing with the DBO engine
ACM Transactions on Database Systems (TODS)
Linked Bernoulli Synopses: Sampling along Foreign Keys
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
A Probabilistic Approach for Computing Approximate Iceberg Cubes
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
A Segmentation-Based Approach for Approximate Query over Distributed Ontologies
ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
TuG synopses for approximate query answering
ACM Transactions on Database Systems (TODS)
Sample synopses for approximate answering of group-by queries
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Finding frequent co-occurring terms in relational keyword search
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Ranking objects based on relationships and fixed associations
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Depth estimation for ranking query optimization
The VLDB Journal — The International Journal on Very Large Data Bases
Query optimizers: time to rethink the contract?
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Data reduction for data analysis
ECC'08 Proceedings of the 2nd conference on European computing conference
Efficient and effective RFID data warehousing
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Fast and effective histogram construction
Proceedings of the 18th ACM conference on Information and knowledge management
Statistical structures for Internet-scale data management
The VLDB Journal — The International Journal on Very Large Data Bases
StatAdvisor: recommending statistical views
Proceedings of the VLDB Endowment
Distributed online aggregations
Proceedings of the VLDB Endowment
Optimality and scalability in lattice histogram construction
Proceedings of the VLDB Endowment
An experimental study of time-constrained aggregate queries
Proceedings of the 13th International Conference on Extending Database Technology
Journal of Intelligent Information Systems
Parallel computing for data reduction
AIKED'10 Proceedings of the 9th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
Data & Knowledge Engineering
A parallel algorithm to compute data synopsis
WSEAS Transactions on Information Science and Applications
Probabilistic model for accuracy estimation in approximate monodimensional analyses
WSEAS Transactions on Computers
Efficient temporal keyword search over versioned text
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A sample advisor for approximate query processing
ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
A data-centric approach to insider attack detection in database systems
RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Regression on evolving multi-relational data streams
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Accuracy estimation in approximate query processing
ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Compression aware physical database design
Proceedings of the VLDB Endowment
The VC-dimension of SQL queries and selectivity estimation through sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Estimating selectivity for joined RDF triple patterns
Proceedings of the 20th ACM international conference on Information and knowledge management
Randomized accuracy-aware program transformations for efficient approximate computations
POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Information Sciences: an International Journal
Deferred maintenance of disk-based random samples
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Hierarchical group-based sampling
BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
Estimating aggregate join queries over data streams using discrete cosine transform
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
What next?: a half-dozen data management research goals for big data and the cloud
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Histograms as statistical estimators for aggregate queries
Information Systems
Efficiently adapting graphical models for selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Selectivity estimation for hybrid queries over text-rich data graphs
Proceedings of the 16th International Conference on Extending Database Technology
CS2: a new database synopsis for query estimation
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
BlinkDB: queries with bounded errors and bounded response times on very large data
Proceedings of the 8th ACM European Conference on Computer Systems
Generation of test databases using sampling methods
Proceedings of the 2013 International Symposium on Software Testing and Analysis
A sampling algebra for aggregate estimation
Proceedings of the VLDB Endowment
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
Optimizing Sample Design for Approximate Query Processing
International Journal of Knowledge-Based Organizations
Hi-index | 0.00 |
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate answers for join-queries using only statistics (in particular, samples) from the base relations. We propose join synopses as an effective solution for this problem and show how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins. We present optimal strategies for allocating the available space among the various join synopses when the query work load is known and identify heuristics for the common case when the work load is not known. We also present efficient algorithms for incrementally maintaining join synopses in the presence of updates to the base relations. Our extensive set of experiments on the TPC-D benchmark database show the effectiveness of join synopses and various other techniques proposed in this paper.