Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Aqua: A Fast Decision Support Systems Using Approximate Query Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Including Group-By in Query Optimization
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A robust, optimization-based approach for approximate answering of aggregate queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient aggregation over objects with extent
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
Approximate Query Answering Using Data Warehouse Striping
Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
Continuous queries over data streams
ACM SIGMOD Record
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
On Linear-Spline Based Histograms
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Approximate Query Answering Using Data Warehouse Striping
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Time-Interval Sampling for Improved Estimations in Data Warehouses
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
A new two-phase sampling based algorithm for discovering association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TiNA: a scheme for temporal coherency-aware in-network aggregation
Proceedings of the 3rd ACM international workshop on Data engineering for wireless and mobile access
Hierarchical dwarfs for the rollup cube
DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
DSQoS-distributed architecture providing QoS in summary warehouses
DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Query sampling in DB2 Universal Database
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Balancing energy efficiency and quality of aggregate data in sensor networks
The VLDB Journal — The International Journal on Very Large Data Bases
Venn Sampling: A Novel Prediction Technique for Moving Objects
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Sample-Based Quality Estimation of Query Results in Relational Database Environments
IEEE Transactions on Knowledge and Data Engineering
Derby/S: a DBMS for sample-based query answering
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Measuring Data Abstraction Quality in Multiresolution Visualizations
IEEE Transactions on Visualization and Computer Graphics
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
Error minimization in approximate range aggregates
Data & Knowledge Engineering
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
ROLAP implementations of the data cube
ACM Computing Surveys (CSUR)
Estimating the output cardinality of partial preaggregation with a measure of clusteredness
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Primitives for workload summarization and implications for SQL
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Robust estimation with sampling and approximate pre-aggregation
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The polynomial complexity of fully materialized coalesced cubes
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Supporting time-constrained SQL queries in oracle
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation
Journal of Systems and Software
Confidence bounds for sampling-based group by estimates
ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file
The VLDB Journal — The International Journal on Very Large Data Bases
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
SNQL: a query language for sensor network databases
TELE-INFO'08 Proceedings of the 7th WSEAS International Conference on Telecommunications and Informatics
Linked Bernoulli Synopses: Sampling along Foreign Keys
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Sample synopses for approximate answering of group-by queries
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Turbo-charging estimate convergence in DBO
Proceedings of the VLDB Endowment
Revisiting the cube lifecycle in the presence of hierarchies
The VLDB Journal — The International Journal on Very Large Data Bases
Sampling dirty data for matching attributes
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Continuous sampling for online aggregation over multiple queries
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Stratified reservoir sampling over heterogeneous data streams
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
A sample advisor for approximate query processing
ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Deferred maintenance of disk-based random samples
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Hierarchical group-based sampling
BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
Approximate answers to OLAP queries on streaming data warehouses
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
A clustered Dwarf structure to speed up queries on data cubes
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
BlinkDB: queries with bounded errors and bounded response times on very large data
Proceedings of the 8th ACM European Conference on Computer Systems
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
Optimizing Sample Design for Approximate Query Processing
International Journal of Knowledge-Based Organizations
Hi-index | 0.00 |
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex decision support queries using precomputed summary statistics, such as samples. Decision support queries routinely segment the data into groups and then aggregate the information in each group (group-by queries). Depending on the data, there can be a wide disparity between the number of data items in each group. As a result, approximate answers based on uniform random samples of the data can result in poor accuracy for groups with very few data items, since such groups will be represented in the sample by very few (often zero) tuples.In this paper, we propose a general class of techniques for obtaining fast, highly-accurate answers for group-by queries. These techniques rely on precomputed non-uniform (biased) samples of the data. In particular, we propose congressional samples, a hybrid union of uniform and biased samples. Given a fixed amount of space, congressional samples seek to maximize the accuracy for all possible group-by queries on a set of columns. We present a one pass algorithm for constructing a congressional sample and use this technique to also incrementally maintain the sample up-to-date without accessing the base relation. We also evaluate query rewriting strategies for providing approximate answers from congressional samples. Finally, we conduct an extensive set of experiments on the TPC-D database, which demonstrates the efficacy of the techniques proposed.