Incorporating quality aspects in sensor data streams
Proceedings of the ACM first Ph.D. workshop in CIKM
Efficient instance-based learning on data streams
Intelligent Data Analysis
A test paradigm for detecting changes in transactional data streams
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Optimal random sampling from distributed streams revisited
DISC'11 Proceedings of the 25th international conference on Distributed computing
Hi-index | 0.00 |
Reservoir sampling maintains a sample that is a "sketch" of the whole data. Existing reservoir sampling methods introduced by J.S Vitter are based on simple random sampling. These algorithms work fine for larger sampling ratios but for small sampling ratios, their performance drops drastically. Note that for streaming data, it is quintessential that the sampling algorithm works efficiently particularly for a very small ratio because streaming data is potentially infinite in size. We proposed a distance based sampling (DSS) for transactional data streams. DSS is designed to produce samples that are "close" to the whole data. It assures the accuracy of the final sample even at very small sampling ratios. Experimental comparison between DSS algorithm and the existing reservoir sampling methods shows that DSS outperforms them significantly particularly for small sample ratios.