Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
Algorithms for data stream systems
Algorithms for data stream systems
Sampling algorithms in a stream operator
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Weighted random sampling with a reservoir
Information Processing Letters
Streaming queries over streaming data
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Hi-index | 0.00 |
This paper introduces the problem of random sampling from time-based sliding windows over weighted streaming data and presents a priority random sampling (PRS) algorithm for this problem. The algorithm extends classic reservoir-sampling algorithm and weighted random sampling algorithm with a reservoir to deal with the expiration of data items from time-based sliding window, and can avoid drawbacks of classic reservoir-sampling algorithm and weighted sampling algorithm with a reservoir. In the new algorithm, a key is assigned for each data item in the time-based sliding window by compromising its weight and arrival time, and works even when the number of data items in a sliding window varies dynamically over time. The experiments show that PRS algorithm is somewhat superior to WRS algorithm.