Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously maintaining order statistics over data streams: extended abstract
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Estimating the sortedness of a data stream
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
An efficient algorithm for approximate biased quantile computation in data streams
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Distributed and Parallel Databases
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Sampling based algorithms for quantile computation in sensor networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fast computation of approximate biased histograms on sliding windows over data streams
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \deltausing O( 1\_ \in \frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\frac{1} { \in } log(1\frac{1}{ \in } log 1\begin{gathered} \frac{1}{\delta } \hfill \\ \hfill \\ \end{gathered})\frac{{\log ^{2 + \alpha }\in N}} {{1 - 1/2^\alpha}}(for\alpha \gt 0) on average while the worst case space remains O( \frac{1}{{ \in ^2 }}\log \frac{1} {\delta }\log\in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee\inand significantly improve the previous best space bound O( \frac{1} {{ \in ^3 }}\log \frac{1}{\delta }\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.