Optimal histograms for limiting worst-case error propagation in the size of join results
ACM Transactions on Database Systems (TODS)
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Querying and mining data streams: you only get one look a tutorial
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams
IEEE Transactions on Knowledge and Data Engineering
Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over multiscale sliding windows
ACM Transactions on Database Systems (TODS)
Continuously maintaining order statistics over data streams: extended abstract
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
An efficient algorithm for approximate biased quantile computation in data streams
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Distributed and Parallel Databases
Cluster based rank query over multidimensional data streams
Proceedings of the 18th ACM conference on Information and knowledge management
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Logging every footstep: quantile summaries for the entire history
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient quantile retrieval on multi-dimensional data
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Maintaining moving sums over data streams
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Fast computation of approximate biased histograms on sliding windows over data streams
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively, using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the "high-biased" and the "targeted" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over high-speed data streams.