Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Computing
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
A guided tour of Chernoff bounds
Information Processing Letters
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Random sampling from hash files
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Ten lectures on wavelets
Fixed-precision estimation of join selectivity
PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Optimal histograms for limiting worst-case error propagation in the size of join results
ACM Transactions on Database Systems (TODS)
Probabilistic methods in query processing
Probabilistic methods in query processing
On the relative cost of sampling for join selectivity estimation
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Randomized algorithms
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Selectivity and cost estimation for joins based on random sampling
Journal of Computer and System Sciences
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Communication complexity
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CONTROL: continuous output and navigation technology with refinement on-line
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A complete guide to DB2 universal database
A complete guide to DB2 universal database
Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Online association rule mining
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A comparison of selectivity estimators for range queries on metric attributes
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximating multi-dimensional aggregate range queries over real attributes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Even strongly universal hashing is pretty fast
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
On the approximation of curves by line segments using dynamic programming
Communications of the ACM
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Global optimization of histograms
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A robust, optimization-based approach for approximate answering of aggregate queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Building regression cost models for multidatabase systems
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast algorithms for hierarchical range histogram construction
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Wavelet synopses with error guarantees
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying and mining data streams: you only get one look a tutorial
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Distributed streams algorithms for sliding windows
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
RHist: adaptive summarization over continuous data streams
Proceedings of the eleventh international conference on Information and knowledge management
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
New directions in traffic measurement and accounting
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Maintenance of Materialized Views of Sampling Queries
Proceedings of the Eighth International Conference on Data Engineering
On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Approximations in Database Systems
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Aqua: A Fast Decision Support Systems Using Approximate Query Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Simple Random Sampling from Relational Databases
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Probabilistic Optimization of Top N Queries
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
Random Sampling from Pseudo-Ranked B+ Trees
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On Linear-Spline Based Histograms
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Histogramming Data Streams with Fast Per-Item Processing
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
ICALP '97 Proceedings of the 24th International Colloquium on Automata, Languages and Programming
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Fast Approximate Answers to Aggregate Queries on a Data Cube
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Binary-Tree Histograms with Tree Indices
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate query processing using wavelets
The VLDB Journal — The International Journal on Very Large Data Bases
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Utilizing histogram information
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Error Minimization for Approximate Computation of Range Aggregates
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Range Selectivity Estimation for Continuous Attributes
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Algorithms for Index-Assisted Selectivity Estimation
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Dynamic Histograms: Capturing Evolving Data Sets
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Estimating flow distributions from sampled flow statistics
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Extended wavelets for multiple measures
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fast Mining of Massive Tabular Data via Approximate Distance Computations
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Tight Lower Bounds for the Distinct Elements Problem
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Gossip-Based Computation of Aggregate Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A multi-dimensional histogram for selectivity estimation and fast approximate query answering
CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Probabilistic wavelet synopses
ACM Transactions on Database Systems (TODS)
Approximate Aggregation Techniques for Sensor Databases
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Optimal space lower bounds for all frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Tabulation based 4-universal hashing with applications to second moment estimation
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A bi-level Bernoulli scheme for database sampling
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Effective use of block-level sampling in statistics estimation
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximation techniques for spatial data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Structure choices for two-dimensional histogram construction
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Fast range query estimation by N-level tree histograms
Data & Knowledge Engineering
Reducing data stream sliding windows by cyclic tree-like histograms
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Deterministic wavelet thresholding for maximum-error metrics
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Sampling in dynamic data streams and applications
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Histograms revisited: when are histograms the best approximation method for aggregates over joins?
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An information theoretic histogram for single dimensional selectivity estimation
Proceedings of the 2005 ACM symposium on Applied computing
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Probability and Computing: Randomized Algorithms and Probabilistic Analysis
Probability and Computing: Randomized Algorithms and Probabilistic Analysis
Wavelet synopsis for data streams: minimizing non-euclidean error
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Sketching streams through the net: distributed approximate query tracking
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
One-pass wavelet synopses for maximum-error metrics
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Online estimation for subset-based SQL queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
CXHist: an on-line classification-based histogram for XML string selectivity estimation
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximation algorithms for wavelet transform coding of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Simpler algorithm for estimating frequency moments of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Wavelet synopses for general error metrics
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Synopses for query optimization: A space-complexity perspective
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
What's new: finding significant differences in network data streams
IEEE/ACM Transactions on Networking (TON)
A quad-tree based multiresolution approach for two-dimensional summary data
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
A fast approximation scheme for probabilistic wavelet synopses
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Techniques for Warehousing of Sample Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
New Sampling-Based Estimators for OLAP Queries
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ISOMER: Consistent Histogram Construction Using Query Feedback
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
XCluster Synopses for Structured XML Content
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On biased reservoir sampling in the presence of stream evolution
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Compact histograms for hierarchical identifiers
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
High-throughput sketch update on a low-power stream processor
Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
ACM Transactions on Database Systems (TODS)
Multi-dimensional Histograms with Tight Bounds for the Error
IDEAS '06 Proceedings of the 10th International Database Engineering and Applications Symposium
Data Stream Management: Processing High-Speed Data Streams (Data-Centric Systems and Applications)
Data Stream Management: Processing High-Speed Data Streams (Data-Centric Systems and Applications)
Compressed histograms with arbitrary bucket layouts for selectivity estimation
Information Sciences: an International Journal
Optimal workload-based weighted wavelet synopses
Theoretical Computer Science
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
Extended wavelets for multiple measures
ACM Transactions on Database Systems (TODS)
Counting distinct items over update streams
Theoretical Computer Science
On synopses for distinct-value estimation under multiset operations
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Improving sketch reconstruction accuracy using linear least squares method
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Maintaining bernoulli samples over evolving multisets
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Fast Algorithm for Approximate Quantiles in High Speed Data Streams
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
A Note on Linear Time Algorithms for Maximum Error Histograms
IEEE Transactions on Knowledge and Data Engineering
Selectivity estimation by batch-query based histogram and parametric method
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Summarizing data using bottom-k sketches
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A near-optimal algorithm for computing the entropy of a stream
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Comparing data streams using Hamming norms (how to zero in)
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
SASH: a self-adaptive histogram set for dynamically changing workloads
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XWAVE: optimal and approximate extended wavelets
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Reversible sketches: enabling monitoring and analysis over high-speed data streams
IEEE/ACM Transactions on Networking (TON)
A Bayesian method for guessing the extreme values in a data set?
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Range-Efficient Counting of Distinct Elements in a Massive Data Stream
SIAM Journal on Computing
Algorithms for linear and nonlinear approximation of large data
Algorithms for linear and nonlinear approximation of large data
Maintaining bounded-size sample synopses of evolving datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation
Journal of Systems and Software
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Histograms based on the minimum description length principle
The VLDB Journal — The International Journal on Very Large Data Bases
Processing top k queries from samples
CoNEXT '06 Proceedings of the 2006 ACM CoNEXT conference
Counter braids: a novel counter architecture for per-flow measurement
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Sampling time-based sliding windows in bounded space
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sketches for size of join estimation
ACM Transactions on Database Systems (TODS)
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Enhancing histograms by tree-like bucket indices
The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Finding frequent items in data streams
Proceedings of the VLDB Endowment
Sketching and Streaming Entropy via Approximation Theory
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
TuG synopses for approximate query answering
ACM Transactions on Database Systems (TODS)
Robust approximate aggregation in sensor data management systems
ACM Transactions on Database Systems (TODS)
Unrestricted wavelet synopses under maximum error bound
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Statistical Analysis and Data Mining
Probabilistic databases: diamonds in the dirt
Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Sampling-based estimators for subset-based queries
The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchically compressed wavelet synopses
The VLDB Journal — The International Journal on Very Large Data Bases
Constant-Time Query Processing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Lattice Histograms: a Resilient Synopsis Structure
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Randomized Synopses for Query Assurance on Data Streams
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On Efficient Query Processing of Stream Counts on the Cell Processor
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
How to scalably and accurately skip past streams
ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
Representing uncertain data: models, properties, and algorithms
The VLDB Journal — The International Journal on Very Large Data Bases
A Bayesian approach to bandwidth selection for multivariate kernel density estimation
Computational Statistics & Data Analysis
Probabilistic histograms for probabilistic data
Proceedings of the VLDB Endowment
Optimality and scalability in lattice histogram construction
Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Preventing bad plans by bounding the impact of cardinality estimation errors
Proceedings of the VLDB Endowment
Green: a framework for supporting energy-conscious programming using controlled approximation
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Optimal sampling from distributed streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Histograms reloaded: the merits of bucket diversity
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
1-pass relative-error Lp-sampling with applications
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Structure-aware sampling on data streams
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
How simulation languages should report results: a modest proposal
Winter Simulation Conference
Workload-optimal histograms on streams
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Fast approximate wavelet tracking on streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Deferred maintenance of disk-based random samples
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Optimal workload-based weighted wavelet synopses
ICDT'05 Proceedings of the 10th international conference on Database Theory
Subquadratic algorithms for workload-aware haar wavelet synopses
FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Adaptive spatial partitioning for multidimensional data streams
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
CR-PRECIS: a deterministic summary structure for update data streams
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Continuous distributed counting for non-monotonic streams
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Sketching and streaming algorithms for processing massive data
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Eagle-eyed elephant: split-oriented indexing in Hadoop
Proceedings of the 16th International Conference on Extending Database Technology
RFID-data compression for supporting aggregate queries
ACM Transactions on Database Systems (TODS)
Sampling estimators for parallel online aggregation
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Anytime approximation in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
ACM SIGMOD Record
Data management research at the technical university of crete
ACM SIGMOD Record
Hi-index | 0.00 |
Methods for Approximate Query Processing (AQP) are essential for dealing with massive data. They are often the only means of providing interactive response times when exploring massive datasets, and are also needed to handle high speed data streams. These methods proceed by computing a lossy, compact synopsis of the data, and then executing the query of interest against the synopsis rather than the entire dataset. We describe basic principles and recent developments in AQP. We focus on four key synopses: random samples, histograms, wavelets, and sketches. We consider issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. We also discuss the trade-offs between the different synopsis types.