Effective Computation of Biased Quantiles over Data Streams

Authors:
Graham Cormode;Flip Korn;S. Muthukrishnan;Divesh Srivastava
Affiliations:
Bell Labs, Lucent Technologies;AT&T Labs-Research;Rutgers University;AT&T Labs-Research
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 15
Cited 12

Optimal histograms for limiting worst-case error propagation in the size of join results

ACM Transactions on Database Systems (TODS)
Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling techniques for space efficient online computation of order statistics of large datasets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Querying and mining data streams: you only get one look a tutorial

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Counting inversions in lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Holistic UDAFs at streaming speeds

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximate counts and quantiles over sliding windows

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
How to summarize the universe: dynamic maintenance of quantiles

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams

IEEE Transactions on Knowledge and Data Engineering
Space- and time-efficient deterministic algorithms for biased quantiles over data streams

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over multiscale sliding windows

ACM Transactions on Database Systems (TODS)
Continuously maintaining order statistics over data streams: extended abstract

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
An efficient algorithm for approximate biased quantile computation in data streams

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method

Distributed and Parallel Databases
Cluster based rank query over multidimensional data streams

Proceedings of the 18th ACM conference on Information and knowledge management
Aggregate computation over data streams

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Logging every footstep: quantile summaries for the entire history

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient quantile retrieval on multi-dimensional data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Maintaining moving sums over data streams

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Fast computation of approximate biased histograms on sliding windows over data streams

Proceedings of the 25th International Conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Skew is prevalent in many data sources such as IP traffic streams. To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two problems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively, using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the "high-biased" and the "targeted" quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures. Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over high-speed data streams.