Dynamic sample selection for approximate query processing

Authors:
Brian Babcock;Surajit Chaudhuri;Gautam Das
Affiliations:
Stanford University;Microsoft Research;Microsoft Research
Venue:
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Year:
2003

Citing 22
Cited 50

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A robust, optimization-based approach for approximate answering of aggregate queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Partial results for online query processing

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Compressing SQL workloads

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Index Selection for OLAP

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Selection of Views to Materialize in a Data Warehouse

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
Aqua: A Fast Decision Support Systems Using Approximate Query Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

The power-method: a comprehensive estimation technique for multi-dimensional queries

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Online maintenance of very large random samples

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Analytical processing of XML documents: opportunities and challenges

ACM SIGMOD Record
Diamond: A Storage Architecture for Early Discard in Interactive Search

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP

Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Improving range-sum query evaluation on data cubes via polynomial approximation

Data & Knowledge Engineering
Derby/S: a DBMS for sample-based query answering

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Measuring Data Abstraction Quality in Multiresolution Visualizations

IEEE Transactions on Visualization and Computer Graphics
Random Sampling for Continuous Streams with Arbitrary Updates

IEEE Transactions on Knowledge and Data Engineering
Approximate range---sum query answering on data cubes with probabilistic guarantees

Journal of Intelligent Information Systems
Optimized stratified sampling for approximate query processing

ACM Transactions on Database Systems (TODS)
Value and Relation Display: Interactive Visual Exploration of Large Data Sets with Hundreds of Dimensions

IEEE Transactions on Visualization and Computer Graphics
Efficient Approximate Query Processing in Peer-to-Peer Networks

IEEE Transactions on Knowledge and Data Engineering
A probabilistic model for data cube compression and query approximation

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Nugget discovery in visual exploration environments by query consolidation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Supporting time-constrained SQL queries in oracle

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
LACO: A location-aware cooperative query system for securely personalized services

Expert Systems with Applications: An International Journal
Efficient clustering of databases induced by local patterns

Decision Support Systems
Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation

Journal of Systems and Software
Maintaining very large random samples using the geometric file

The VLDB Journal — The International Journal on Very Large Data Bases
Distinct value estimation on peer-to-peer networks

Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Linked Bernoulli Synopses: Sampling along Foreign Keys

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Pruning attribute values from data cubes with diamond dicing

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Online maintenance of very large random samples on flash storage

Proceedings of the VLDB Endowment
Sample synopses for approximate answering of group-by queries

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
What Can Formal Concept Analysis Do for Data Warehouses?

ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
Continuous Spatial Authentication

SSTD '09 Proceedings of the 11th International Symposium on Advances in Spatial and Temporal Databases
Approximate Rewriting of Queries Using Views

ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Online maintenance of very large random samples on flash storage

The VLDB Journal — The International Journal on Very Large Data Bases
Continuous authentication on relational streams

The VLDB Journal — The International Journal on Very Large Data Bases
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
A robust approach to find effective items in distributed data streams

LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications
Continuous sampling for online aggregation over multiple queries

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Unbiased estimation of size and other aggregates over hidden web databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A data-centric approach to insider attack detection in database systems

RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Effective and efficient sampling methods for deep web aggregation queries

Proceedings of the 14th International Conference on Extending Database Technology
Diamond: a storage architecture for early discard in interactive search

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Materialized view management in peer to peer environment

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Approximate and incremental processing of complex queries against the web of data

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
The VC-dimension of SQL queries and selectivity estimation through sampling

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Deferred maintenance of disk-based random samples

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A simple, yet effective and efficient, sliding window sampling algorithm

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Flexible query answering in data cubes

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Hierarchical group-based sampling

BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
Secure Distributed Data Aggregation

Foundations and Trends in Databases
Skimmer: rapid scrolling of relational query results

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Towards intensional answers to OLAP queries for analytical sessions

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Optimizing Sample Design for Approximate Query Processing

International Journal of Knowledge-Based Organizations

Quantified Score

Hi-index	0.00

Visualization

Abstract

In decision support applications, the ability to provide fast approximate answers to aggregation queries is desirable. One commonly-used technique for approximate query answering is sampling. For many aggregation queries, appropriately constructed biased (non-uniform) samples can provide more accurate approximations than a uniform sample. The optimal type of bias, however, varies from query to query. In this paper, we describe an approximate query processing technique that dynamically constructs an appropriately biased sample for each query by combining samples selected from a family of non-uniform samples that are constructed during a pre-processing phase. We show that dynamic selection of appropriate portions of previously constructed samples can provide more accurate approximate answers than static, non-adaptive usage of uniform or non-uniform samples.