The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations
Communications of the ACM
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Effective Computation of Biased Quantiles over Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Power-conserving computation of order-statistics over sensor networks
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-efficient Relative Error Order Sketch over Data Streams
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Fast Algorithm for Approximate Quantiles in High Speed Data Streams
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Fast computation of approximate biased histograms on sliding windows over data streams
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
We propose an efficient algorithm for approximate biased quantile computation in large data streams. Our algorithm computes decomposable biased quantile summaries on fixed sized blocks and dynamically maintains the biased quantile summary for the entire stream as the exponential histogram over the block-wise quantile summaries. The algorithm is computationally efficient and achieves an amortized computational cost of O(log(1⁄∈log(∈n))) and a space requirement of O(log3∈n↬∈). Our algorithm does not assume prior knowledge of the stream sizes or the range of data values in the streams. In practice, our algorithm is able to efficiently maintain summaries over large data streams with over tens of millions of observations and achieves significant performance improvement over prior algorithms.