Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Elements of statistical computing: numerical computation
Elements of statistical computing: numerical computation
Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Quasi-cubes: exploiting approximations in multidimensional databases
ACM SIGMOD Record
Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Using approximations to scale exploratory data analysis in datacubes
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Overcoming Limitations of Sampling for Aggregation Queries
Proceedings of the 17th International Conference on Data Engineering
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Approximate Answers to Aggregate Queries on a Data Cube
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
A Decathlon in Multidimensional Modeling: Open Issues and Some Solutions
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The power-method: a comprehensive estimation technique for multi-dimensional queries
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Load Shedding for Aggregation Queries over Data Streams
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Ranked Relations: Query Languages and Query Processing Methods for Multimedia
Multimedia Tools and Applications
Design of a data warehouse system for network/web services
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Spatiotemporal Aggregate Computation: A Survey
IEEE Transactions on Knowledge and Data Engineering
Venn Sampling: A Novel Prediction Technique for Moving Objects
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Analytical processing of XML documents: opportunities and challenges
ACM SIGMOD Record
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
Optimal workload-based weighted wavelet synopses
Theoretical Computer Science
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
Efficient Approximate Query Processing in Peer-to-Peer Networks
IEEE Transactions on Knowledge and Data Engineering
Primitives for workload summarization and implications for SQL
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Robust estimation with sampling and approximate pre-aggregation
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
STAR: self-tuning aggregation for scalable monitoring
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation
Journal of Systems and Software
Confidence bounds for sampling-based group by estimates
ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file
The VLDB Journal — The International Journal on Very Large Data Bases
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Sample synopses for approximate answering of group-by queries
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Statistical structures for Internet-scale data management
The VLDB Journal — The International Journal on Very Large Data Bases
Streaming multiple aggregations using phantoms
The VLDB Journal — The International Journal on Very Large Data Bases
Stratified reservoir sampling over heterogeneous data streams
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Effective and efficient sampling methods for deep web aggregation queries
Proceedings of the 14th International Conference on Extending Database Technology
Effective stratification for low selectivity queries on deep web data sources
Proceedings of the 20th ACM international conference on Information and knowledge management
Optimal workload-based weighted wavelet synopses
ICDT'05 Proceedings of the 10th international conference on Database Theory
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
Hi-index | 0.00 |
The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we treat the problem as an optimization problem whose goal is to minimize the error in answering queries in the given workload. A key novelty of our approach is that we can tailor the choice of samples to be robust even for workloads that are “similar” but not necessarily identical to the given workload. Finally, our techniques recognize the importance of taking into account the variance in the data distribution in a principled manner. We show how our solution can be implemented on a database system, and present results of extensive experiments on Microsoft SQL Server 2000 that demonstrate the superior quality of our method compared to previous work.