Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
On estimating the size of projections
ICDT '90 Proceedings of the third international conference on database theory on Database theory
Error-constrained COUNT query evaluation in relational databases
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Efficient sampling strategies for relational database operations
ICDT Selected papers of the 4th international conference on Database theory
On the relative cost of sampling for join selectivity estimation
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Randomized algorithms
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Sampling from Spatial Databases
Proceedings of the Ninth International Conference on Data Engineering
Simple Random Sampling from Relational Databases
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Spatial join selectivity using power laws
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Sampling from databases using B+-trees
Proceedings of the ninth international conference on Information and knowledge management
A robust, optimization-based approach for approximate answering of aggregate queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Applying the golden rule of sampling for query estimation
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Continuous queries over data streams
ACM SIGMOD Record
Automatic tuning of data synopses
Information Systems - Special issue: Best papers from EDBT 2002
A Framework for the Physical Design Problem for Data Synopses
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
ICICLES: Self-Tuning Samples for Approximate Query Answering
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
Limiting Result Cardinalities for Multidatabase Queries Using Histograms
BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
On producing join results early
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate join processing over data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
Query Size Estimation for Joins Using Systematic Sampling
Distributed and Parallel Databases
A Selectivity Model for Fragmented Relations: Applied in Information Retrieval
IEEE Transactions on Knowledge and Data Engineering
Load Shedding for Aggregation Queries over Data Streams
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Flow sampling under hard resource constraints
Proceedings of the joint international conference on Measurement and modeling of computer systems
A bi-level Bernoulli scheme for database sampling
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Query sampling in DB2 Universal Database
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Semantic Approximation of Data Stream Joins
IEEE Transactions on Knowledge and Data Engineering
Synopses for query optimization: a space-complexity perspective
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Histograms revisited: when are histograms the best approximation method for aggregates over joins?
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Estimating arbitrary subset sums with few probes
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
RankSQL: query algebra and optimization for relational top-k queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Automatic physical database tuning: a relaxation-based approach
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On joining and caching stochastic streams
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A disk-based join with probabilistic guarantees
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
When can we trust progress estimators for SQL queries?
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Synopses for query optimization: A space-complexity perspective
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Weighted random sampling with a reservoir
Information Processing Letters
Confidence intervals for priority sampling
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Graph-based synopses for relational selectivity estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
Classification spanning correlated data streams
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
ACM Transactions on Database Systems (TODS)
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
Detectives: detecting coalition hit inflation attacks in advertising networks streams
Proceedings of the 16th international conference on World Wide Web
Cardinality estimation using sample views with quality assurance
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
The effect of reading policy on early join result production
Information Sciences: an International Journal
Effective change detection using sampling
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Sampling from databases using B$^+$-Trees
Intelligent Data Analysis
Sampling streaming data with replacement
Computational Statistics & Data Analysis
GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding
IEEE Transactions on Knowledge and Data Engineering
Priority sampling for estimation of arbitrary subset sums
Journal of the ACM (JACM)
Memory-limited execution of windowed stream joins
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Supporting time-constrained SQL queries in oracle
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A stratified approach to progressive approximate joins
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Event dissemination via group-aware stream filtering
Proceedings of the second international conference on Distributed event-based systems
Scalable approximate query processing with the DBO engine
ACM Transactions on Database Systems (TODS)
Linked Bernoulli Synopses: Sampling along Foreign Keys
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Group-aware stream filtering for bandwidth-efficient data dissemination
International Journal of Parallel, Emergent and Distributed Systems - Best Papers from the WWASN2007 Workshop
Stream sampling for variance-optimal estimation of subset sums
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The design of a query monitoring system
ACM Transactions on Database Systems (TODS)
TuG synopses for approximate query answering
ACM Transactions on Database Systems (TODS)
Semantics and implementation of continuous sliding window queries over data streams
ACM Transactions on Database Systems (TODS)
A sampling approach for XML query selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Finding frequent co-occurring terms in relational keyword search
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Towards collaborative data reduction in stream-processing systems
International Journal of Communication Networks and Distributed Systems
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Generating example data for dataflow programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
ROX: run-time optimization of XQueries
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Query optimizers: time to rethink the contract?
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
StatAdvisor: recommending statistical views
Proceedings of the VLDB Endowment
Composable, scalable, and accurate weight summarization of unaggregated data sets
Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts
Proceedings of the VLDB Endowment
Weighted random sampling with a reservoir
Information Processing Letters
An experimental study of time-constrained aggregate queries
Proceedings of the 13th International Conference on Extending Database Technology
Event-based lossy compression for effective and efficient OLAP over data streams
Data & Knowledge Engineering
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling dirty data for matching attributes
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SQL query space and time complexity estimation for multidimensional queries
International Journal of Intelligent Information and Database Systems
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
Data & Knowledge Engineering
Estimating set intersection using small samples
ACSC '10 Proceedings of the Thirty-Third Australasian Conferenc on Computer Science - Volume 102
A data-centric approach to insider attack detection in database systems
RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Similarity join size estimation using locality sensitive hashing
Proceedings of the VLDB Endowment
The VC-dimension of SQL queries and selectivity estimation through sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Optimal sampling from sliding windows
Journal of Computer and System Sciences
Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums
SIAM Journal on Computing
Hierarchical group-based sampling
BNCOD'05 Proceedings of the 22nd British National conference on Databases: enterprise, Skills and Innovation
What next?: a half-dozen data management research goals for big data and the cloud
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Non-linear data stream compression: foundations and theoretical results
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Histograms as statistical estimators for aggregate queries
Information Systems
Efficiently adapting graphical models for selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Cost exploration of data sharings in the cloud
Proceedings of the 16th International Conference on Extending Database Technology
xPAD: a platform for analytic data flows
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Optimus: a dynamic rewriting framework for data-parallel execution plans
Proceedings of the 8th ACM European Conference on Computer Systems
Scalable progressive analytics on big data in the cloud
Proceedings of the VLDB Endowment
A sampling algebra for aggregate estimation
Proceedings of the VLDB Endowment
Adaptive stratified reservoir sampling over heterogeneous data streams
Information Systems
Optimizing Sample Design for Approximate Query Processing
International Journal of Knowledge-Based Organizations
Hi-index | 0.00 |
A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are significantly more efficient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.