Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Applying the golden rule of sampling for query estimation
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continuous queries over data streams
ACM SIGMOD Record
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Maintaining time-decaying stream aggregates
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
One-pass wavelet synopses for maximum-error metrics
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Time-decaying sketches for sensor data aggregation
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Sampling time-based sliding windows in bounded space
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A new sampling technique for association rule mining
Journal of Information Science
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Feature Clustering for Data Steering in Dynamic Data Driven Application Systems
ICCS 2009 Proceedings of the 9th International Conference on Computational Science
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
Data & Knowledge Engineering
Stratified reservoir sampling over heterogeneous data streams
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
The orange customer analysis platform
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Time-decaying Sketches for Robust Aggregation of Sensor Data
SIAM Journal on Computing
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Optimal random sampling from distributed streams revisited
DISC'11 Proceedings of the 25th international conference on Distributed computing
Optimal sampling from sliding windows
Journal of Computer and System Sciences
A simple, yet effective and efficient, sliding window sampling algorithm
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Proceedings of the 15th International Conference on Extending Database Technology
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Space-efficient sampling from social activity streams
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Reservoir sampling techniques in modern data analysis
Proceedings of the Fifth Balkan Conference in Informatics
A survey on concept drift adaptation
ACM Computing Surveys (CSUR)
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
Hi-index | 0.00 |
The method of reservoir based sampling is often used to pick an unbiased sample from a data stream. A large portion of the unbiased sample may become less relevant over time because of evolution. An analytical or mining task (eg. query estimation) which is specific to only the sample points from a recent time-horizon may provide a very inaccurate result. This is because the size of the relevant sample reduces with the horizon itself. On the other hand, this is precisely the most important case for data stream algorithms, since recent history is frequently analyzed. In such cases, we show that an effective solution is to bias the sample with the use of temporal bias functions. The maintenance of such a sample is non-trivial, since it needs to be dynamically maintained, without knowing the total number of points in advance. We prove some interesting theoretical properties of a large class of memory-less bias functions, which allow for an efficient implementation of the sampling algorithm. We also show that the inclusion of bias in the sampling process introduces a maximum requirement on the reservoir size. This is a nice property since it shows that it may often be possible to maintain the maximum relevant sample with limited storage requirements. We not only illustrate the advantages of the method for the problem of query estimation, but also show that the approach has applicability to broader data mining problems such as evolution analysis and classification.