Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Size-estimation framework with applications to transitive closure and reachability
Journal of Computer and System Sciences
Mirror, mirror on the Web: a study of host pairs with replicated content
WWW '99 Proceedings of the eighth international conference on World Wide Web
A protocol-independent technique for eliminating redundant network traffic
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Finding Interesting Associations without Support Pruning
IEEE Transactions on Knowledge and Data Engineering
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Efficient estimation algorithms for neighborhood variance and other moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Estimating arbitrary subset sums with few probes
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A 9.5mW 4GHz WCDMA frequency synthesizer in 0.13μm CMOS
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Spatially-decaying aggregation over a network
Journal of Computer and System Sciences
Sketching unaggregated data streams for subpopulation-size queries
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketching unaggregated data streams for subpopulation-size queries
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Summarizing data using bottom-k sketches
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Algorithms and estimators for accurate summarization of internet traffic
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Priority sampling for estimation of arbitrary subset sums
Journal of the ACM (JACM)
Tighter estimation using bottom k sketches
Proceedings of the VLDB Endowment
Stream sampling for variance-optimal estimation of subset sums
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
A comparison of three algorithms for approximating the distance distribution in real-world graphs
TAPAS'11 Proceedings of the First international ICST conference on Theory and practice of algorithms in (computer) systems
Towards a universal sketch for origin-destination network measurements
NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums
SIAM Journal on Computing
CRSI: a compact randomized similarity index for set-valued features
Proceedings of the 15th International Conference on Extending Database Technology
Hi-index | 0.00 |
A Bottom-k sketch is a summary of a set of items with nonnegative weights. Each such summary allows us to compute approximate aggregates over the set of items. Bottom-k sketches are obtained by associating with each item in a ground set an independent random rank drawn from a probability distribution that depends on the weight of the item. For each subset of interest, the bottom-k sketch is the set of the k minimum ranked items and their ranks. Bottom-k sketches have numerous applications. We develop and analyze data structures and estimators for bottom-k sketches to facilitate their deployment. We develop novel estimators and algorithms that show that they are a superior alternative to other sketching methods in both efficiency of obtaining the sketches and the accuracy of the estimates derived from the sketches.