Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
A logarithmic time sort for linear size networks
Journal of the ACM (JACM)
Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems
Histogram-based estimation techniques in database systems
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A comparison of selectivity estimators for range queries on metric attributes
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Implications of certain assumptions in database performance evauation
ACM Transactions on Database Systems (TODS)
Approximating multi-dimensional aggregate range queries over real attributes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A robust, optimization-based approach for approximate answering of aggregate queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The optimization of queries in relational databases
The optimization of queries in relational databases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Querying about the Past, the Present, and the Future in Spatio-Temporal Databases
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adapting to source properties in processing data integration queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Discretization from data streams: applications to histograms and data mining
Proceedings of the 2006 ACM symposium on Applied computing
Approximate quantiles and the order of the stream
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Spatio-temporal data reduction with deterministic error bounds
The VLDB Journal — The International Journal on Very Large Data Bases
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient peer-to-peer semantic overlay networks based on statistical language models
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
Compressed histograms with arbitrary bucket layouts for selectivity estimation
Information Sciences: an International Journal
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file
The VLDB Journal — The International Journal on Very Large Data Bases
Workload-Aware Histograms for Remote Applications
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A framework for estimating complex probability density structures in data streams
Proceedings of the 17th ACM conference on Information and knowledge management
The average-case complexity of counting distinct elements
Proceedings of the 12th International Conference on Database Theory
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Self-tuning management of update-intensive multidimensional data in clusters of workstations
The VLDB Journal — The International Journal on Very Large Data Bases
Enabling OLAP in mobile environments via intelligent data cube compression techniques
Journal of Intelligent Information Systems
Fast and effective histogram construction
Proceedings of the 18th ACM conference on Information and knowledge management
Dimension table driven approach to referential partition relational data warehouses
Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP
Statistical structures for Internet-scale data management
The VLDB Journal — The International Journal on Very Large Data Bases
Optimality and scalability in lattice histogram construction
Proceedings of the VLDB Endowment
A statistics propagation approach to enable cost-based optimization of statement sequences
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Hierarchically organized skew-tolerant histograms for geographic data objects
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Towards approximate SQL: infobright's approach
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Efficient construction of histograms for multidimensional data using quad-trees
Decision Support Systems
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Information Sciences: an International Journal
Efficient approximate visibility query in large dynamic environments
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
HEDC: a histogram estimator for data in the cloud
Proceedings of the fourth international workshop on Cloud data management
Sort-based parallel loading of R-trees
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
Scalable and dynamically balanced shared-everything OLTP with physiological partitioning
The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
Data & Knowledge Engineering
Exploring optimization and caching for efficient collection operations
Automated Software Engineering
Hi-index | 0.00 |
Many commercial database systems maintain histograms to summarize the contents of large relations and permit efficient estimation of query result sizes for use in query optimizers. Delaying the propagation of database updates to the histogram often introduces errors into the estimation. This article presents new sampling-based approaches for incremental maintenance of approximate histograms. By scheduling updates to the histogram based on the updates to the database, our techniques are the first to maintain histograms effectively up to date at all times and avoid computing overheads when unnecessary. Our techniques provide highly accurate approximate histograms belonging to the equidepth and Compressed classes. Experimental results show that our new approaches provide orders of magnitude more accurate estimation than previous approaches.An important aspect employed by these new approaches is a backing sample, an up-to-date random sample of the tuples currently in a relation. We provide efficient solutions for maintaining a uniformly random sample of a relation in the presence of updates to the relation. The backing sample techniques can be used for any other application that relies on random samples of data.