The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations
Communications of the ACM
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems
Histogram-based estimation techniques in database systems
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Incremental quantile estimation for massive tracking
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Parallel sorting on a shared-nothing architecture using probabilistic splitting
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
RHist: adaptive summarization over continuous data streams
Proceedings of the eleventh international conference on Information and knowledge management
Correlating XML data streams using tree-edit distance embeddings
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximation techniques for spatial data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Tracking set-expression cardinalities over continuous update streams
The VLDB Journal — The International Journal on Very Large Data Bases
Effective Computation of Biased Quantiles over Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
XML stream processing using tree-edit distance embeddings
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
PADS: a domain-specific language for processing ad hoc data
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Join-distinct aggregate estimation over update streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Domain-Driven Data Synopses for Dynamic Quantiles
IEEE Transactions on Knowledge and Data Engineering
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams
IEEE Transactions on Knowledge and Data Engineering
Summarizing level-two topological relations in large spatial datasets
ACM Transactions on Database Systems (TODS)
Approximate quantiles and the order of the stream
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
PADS: an end-to-end system for processing ad hoc data
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Extending the data warehouse for service provisioning data
Data & Knowledge Engineering - Special issue: ER 2003
Reverse nearest neighbor aggregates over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Multiscale histograms: summarizing topological relations in large spatial datasets
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Distributed set-expression cardinality estimation
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Resource sharing in continuous sliding-window aggregates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Proof-infused streams: enabling authentication of sliding window queries on streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
Tight lower bounds for selection in randomly ordered streams
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Finding popular categories for RFID tags
Proceedings of the 9th ACM international symposium on Mobile ad hoc networking and computing
Finding Frequent Items over General Update Streams
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Finding frequent items in data streams
Proceedings of the VLDB Endowment
The eternal sunshine of the sketch data structure
Computer Networks: The International Journal of Computer and Telecommunications Networking
A framework for estimating complex probability density structures in data streams
Proceedings of the 17th ACM conference on Information and knowledge management
Multi-query optimization for sketch-based estimation
Information Systems
ODMCA: An adaptive data mining control algorithm in multicarrier networks
Computer Communications
Grid-aware approach to data statistics, data understanding and data preprocessing
International Journal of High Performance Computing and Networking
Optimal tracking of distributed heavy hitters and quantiles
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding the frequent items in streams of data
Communications of the ACM - A View of Parallel Computing
Small synopses for group-by query verification on outsourced data streams
ACM Transactions on Database Systems (TODS)
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Distributed and Parallel Databases
Incremental tracking of multiple quantiles for network monitoring in cellular networks
Proceedings of the 1st ACM workshop on Mobile internet through cellular networks
Computing histograms of local variables for real-time monitoring using aggregation trees
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Methods for finding frequent items in data streams
The VLDB Journal — The International Journal on Very Large Data Bases
A Streaming Parallel Decision Tree Algorithm
The Journal of Machine Learning Research
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Logging every footstep: quantile summaries for the entire history
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Lower bounds on frequency estimation of data streams
CSR'08 Proceedings of the 3rd international conference on Computer science: theory and applications
Sampling based algorithms for quantile computation in sensor networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Fast approximate wavelet tracking on streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
On futuristic query processing in data streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Efficient quantile retrieval on multi-dimensional data
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Adaptive spatial partitioning for multidimensional data streams
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Lower bounds for quantile estimation in random-order and multi-pass streaming
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Quantiles over data streams: an experimental study
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast computation of approximate biased histograms on sliding windows over data streams
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
ACM Transactions on Database Systems (TODS) - Invited papers issue
Indexing for summary queries: Theory and practice
ACM Transactions on Database Systems (TODS)
Sketch-based geometric monitoring of distributed stream queries
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Order statistics, i.e., quantiles, are frequently used in databases both at the database server as well as the application level. For example, they are useful in selectivity estimation during query optimization, in partitioning large relations, in estimating query result sizes when building user interfaces, and in characterizing the data distribution of evolving datasets in the process of data mining. We present a new algorithm for dynamically computing quantiles of a relation subject to insert as well as delete operations. The algorithm monitors the operations and maintains a simple, small-space representation (based on random subset sums or RSSs) of the underlying data distribution. Using these RSSs, we can quickly estimate, without having to access the data, all the quantiles, each guaranteed to be accurate to within user-specified precision. Previously-known one-pass quantile estimation algorithms that provide similar quality and performance guarantees can not handle deletions. Other algorithms that can handle delete operations cannot guarantee performance without rescanning the entire database. We present the algorithm, its theoretical performance analysis and extensive experimental results with synthetic and real datasets. Independent of the rates of insertions and deletions, our algorithm is remarkably precise at estimating quantiles in small space, as our experiments demonstrate.