The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations
Communications of the ACM
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Parallel sorting on a shared-nothing architecture using probabilistic splitting
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ON LOWER BOUNDS FOR SELECTION PROBLEMS
ON LOWER BOUNDS FOR SELECTION PROBLEMS
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Synopsis data structures for massive data sets
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Incremental quantile estimation for massive tracking
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing correlated aggregates over continual data streams
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Mining data streams under block evolution
ACM SIGKDD Explorations Newsletter
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Journal of Algorithms - Analysis of algorithms
Automating Statistics Management for Query Optimizers
IEEE Transactions on Knowledge and Data Engineering
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Data Warehousing Architecture for Enabling Service Provisioning Process
Proceedings of the 27th International Conference on Very Large Data Bases
Querying and Clustering Very Large Data Sets Using Dynamic Bucketing Approach
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Handbook of massive data sets
Sorting and selection on parallel disk models
Handbook of massive data sets
On finding common neighborhoods in massive graphs
Theoretical Computer Science
Towards NIC-based intrusion detection
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Evaluating holistic aggregators efficiently for very large datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Classifying Matrices Separating Rows and Columns
IEEE Transactions on Parallel and Distributed Systems
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Spatially-decaying aggregation over a network: model and algorithms
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximately uniform random sampling in sensor networks
DMSN '04 Proceeedings of the 1st international workshop on Data management for sensor networks: in conjunction with VLDB 2004
Effective Computation of Biased Quantiles over Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Power-conserving computation of order-statistics over sensor networks
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Domain-Driven Data Synopses for Dynamic Quantiles
IEEE Transactions on Knowledge and Data Engineering
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Buttress: A Toolkit for Flexible and High Fidelity I/O Benchmarking
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams
IEEE Transactions on Knowledge and Data Engineering
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Implementing sorting in database systems
ACM Computing Surveys (CSUR)
Approximate quantiles and the order of the stream
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketching asynchronous streams over a sliding window
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Quantile estimation: a minimalist approach
Proceedings of the 38th conference on Winter simulation
Spatially-decaying aggregation over a network
Journal of Computer and System Sciences
Optimal workload-based weighted wavelet synopses
Theoretical Computer Science
Using grouping variables to express complex decision support queries
Data & Knowledge Engineering
Extending the data warehouse for service provisioning data
Data & Knowledge Engineering - Special issue: ER 2003
Regression analysis for massive datasets
Data & Knowledge Engineering
Fast data stream algorithms using associative memories
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Continuously maintaining order statistics over data streams: extended abstract
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Inner-product based wavelet synopses for range-sum queries
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A transducer-based XML query processor
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Reverse nearest neighbor aggregates over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Tuple routing strategies for distributed eddies
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
An efficient algorithm for approximate biased quantile computation in data streams
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Depth estimation for ranking query optimization
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Tight lower bounds for selection in randomly ordered streams
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the first workshop on Online social networks
Securely computing an approximate median in wireless sensor networks
Proceedings of the 4th international conference on Security and privacy in communication netowrks
Optimal splitters for database partitioning with size bounds
Proceedings of the 12th International Conference on Database Theory
Estimating aggregates in time-constrained approximate queries in Oracle
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Information processing using data stream management system on Jamdroid
Proceedings of the International Conference on Advances in Computing, Communication and Control
Capture, conversion, and analysis of an intense NFS workload
FAST '09 Proccedings of the 7th conference on File and storage technologies
Depth estimation for ranking query optimization
The VLDB Journal — The International Journal on Very Large Data Bases
Secure median computation in wireless sensor networks
Ad Hoc Networks
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Distributed and Parallel Databases
Incremental spectral clustering by efficiently updating the eigen-system
Pattern Recognition
Improving the performance of list intersection
Proceedings of the VLDB Endowment
An experimental study of time-constrained aggregate queries
Proceedings of the 13th International Conference on Extending Database Technology
A Streaming Parallel Decision Tree Algorithm
The Journal of Machine Learning Research
Handling numeric attributes in hoeffding trees
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Logging every footstep: quantile summaries for the entire history
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A general framework to detect unsafe system states from multisensor data stream
IEEE Transactions on Intelligent Transportation Systems
Toward visual analysis of ensemble data sets
Proceedings of the 2009 Workshop on Ultrascale Visualization
An Ω(1/ε log 1/ε) space lower bound for finding ε-approximate quantiles in a data stream
FAW'10 Proceedings of the 4th international conference on Frontiers in algorithmics
Fast and accurate computation of equi-depth histograms over data streams
Proceedings of the 14th International Conference on Extending Database Technology
Real-time approximate Range Motif discovery & data redundancy removal algorithm
Proceedings of the 14th International Conference on Extending Database Technology
Buttress: a toolkit for flexible and high fidelity I/O benchmarking
FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Finding longest increasing and common subsequences in streaming data
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Efficient quantile retrieval on multi-dimensional data
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Evaluating mid-(k, n) queries using b+-tree
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Optimal workload-based weighted wavelet synopses
ICDT'05 Proceedings of the 10th international conference on Database Theory
Approximate range mode and range median queries
STACS'05 Proceedings of the 22nd annual conference on Theoretical Aspects of Computer Science
Adaptive spatial partitioning for multidimensional data streams
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Deterministic splitter finding in a stream with constant storage and guarantees
ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Secure Distributed Data Aggregation
Foundations and Trends in Databases
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
A quantile estimation for massive data with generalized Pareto distribution
Computational Statistics & Data Analysis
On saying "enough already!" in MapReduce
Proceedings of the 1st International Workshop on Cloud Intelligence
Lower bounds for quantile estimation in random-order and multi-pass streaming
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Sort-based parallel loading of R-trees
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Quantiles over data streams: an experimental study
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Quality and efficiency for kernel density estimates in large data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Fast computation of approximate biased histograms on sliding windows over data streams
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
ACM Transactions on Database Systems (TODS) - Invited papers issue
Hi-index | 0.00 |
We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude.We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter.We present the algorithms, their theoretical analysis and simulation results on different datasets.