Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
On the relative cost of sampling for join selectivity estimation
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))
ACM Transactions on Mathematical Software (TOMS)
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A robust, optimization-based approach for approximate answering of aggregate queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
Proceedings of the 17th International Conference on Data Engineering
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A bi-level Bernoulli scheme for database sampling
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Effective use of block-level sampling in statistics estimation
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
SINA: scalable incremental processing of continuous queries in spatio-temporal databases
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Sampling in dynamic data streams and applications
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Load shedding in a data stream manager
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Robust estimation with sampling and approximate pre-aggregation
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XWAVE: optimal and approximate extended wavelets
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Memory-limited execution of windowed stream joins
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Sharable file searching in unstructured Peer-to-peer systems
The Journal of Supercomputing
Efficient sampling of non-strict turnstile data streams
FCT'13 Proceedings of the 19th international conference on Fundamentals of Computation Theory
Non-uniformity issues and workarounds in bounded-size sampling
The VLDB Journal — The International Journal on Very Large Data Bases
Fast classification for large data sets via random selection clustering and Support Vector Machines
Intelligent Data Analysis
Hi-index | 0.00 |
The existing random sampling methods have at least one of the following disadvantages: they 1) are applicable only to certain update patterns, 2) entail large space overhead, or 3) incur prohibitive maintenance cost. These drawbacks prevent their effective application in stream environments (where a relation is updated by a large volume of insertions and deletions that may arrive in any order), despite the considerable success of random sampling in conventional databases. Motivated by this, we develop several fully dynamic algorithms for obtaining random samples from individual relations, and from the join result of two tables. Our solutions can handle any update pattern with small space and computational overhead. We also present an in-depth analysis that provides valuable insight into the characteristics of alternative sampling strategies and leads to precision guarantees. Extensive experiments validate our theoretical findings and demonstrate the efficiency of our techniques in practice.