Expressions for batched searching of sequential and hierarchical files
ACM Transactions on Database Systems (TODS)
Data structures using Pascal
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Database management systems
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Simple Random Sampling from Relational Databases
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Hi-index | 0.00 |
Sampling techniques are becoming increasingly important for verylarge databases. However, the problem of obtaining a random samplefrom index structures has not received much attention. In thispaper, we examine sampling techniques for B^+-tree. As the fanoutof each node varies, a random walk through the index structure doesnot produce a good representative sample of the data set. Wepropose a new technique, called B^+-Tree based Weighted RandomSampling (BTWRS), that alters the inclusion probabilities ofrecords accordingly to allow more records from leaves, along thepaths with higher fanouts, to be extracted. We extensivelyevaluated our method, and the results show that there is animprovement in BTWRS over the existing schemes in terms of thequality of the samples obtained and the efficiency of the samplingprocess. The proposed method can be readily adopted in existingcommercial systems.