Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Estimating the size of generalized transitive closures
VLDB '89 Proceedings of the 15th international conference on Very large data bases
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Performance analysis of linear hashing with partial expansions
ACM Transactions on Database Systems (TODS)
Extendible hashing—a fast access method for dynamic files
ACM Transactions on Database Systems (TODS)
Secure statistical databases with random sample queries
ACM Transactions on Database Systems (TODS)
Analysis and performance of inverted data base structures
Communications of the ACM
Simple Random Sampling from Relational Databases
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Sampling Algorithms for Differential Batch Retrieval Problems (Extended Abstract)
Proceedings of the 11th Colloquium on Automata, Languages and Programming
New Strategies for Computing the Transitive Closure of a Database Relation
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Computer based management information systems embodying answer accuracy as a user parameter
Computer based management information systems embodying answer accuracy as a user parameter
Processing time-constrained aggregate queries in CASE-DB
ACM Transactions on Database Systems (TODS)
On the relative cost of sampling for join selectivity estimation
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Evaluation of Non-Equijoin Algorithms
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Algebraic Optimization of Computations over Scientific Databases
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A disk-based join with probabilistic guarantees
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Confidence bounds for sampling-based group by estimates
ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file
The VLDB Journal — The International Journal on Very Large Data Bases
Online maintenance of very large random samples on flash storage
Proceedings of the VLDB Endowment
Online maintenance of very large random samples on flash storage
The VLDB Journal — The International Journal on Very Large Data Bases
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Hi-index | 0.00 |
In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and hash files with separate overflow chains. The dynamic hashing methods considered are Linear Hash files [Lit80] and Extendible Hash files [FNPS79]. We give the cost of sampling in terms of the cost of successfully searching a hash file and show how to exploit features of the dynamic hashing methods to improve sampling efficiency.