Leveraging discarded samples for tighter estimation of multiple-set aggregates
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Hi-index | 0.01 |
Many datasets, including market basket data, text or hypertext documents, and measurement data collected in different nodes or time periods, are modeled as a collection of sets over a ground set of (weighted) items. We consider the problem of estimating basic aggregates such as the weight or selectivity of a subpopulation of the items. We extend classic summarization techniques based on sampling to this scenario when we have multiple sets and selection predicates based on membership in particular sets.