Sampling strategies for epidemic-style information dissemination

Authors:
Milan Vojnović;Varun Gupta;Thomas Karagiannis;Christos Gkantsidis
Affiliations:
Microsoft Research Cambridge, Cambridge, UK;Carnegie Mellon University, Pittsburgh, PA and Microsoft Research Cambridge, Cambridge, UK;Microsoft Research Cambridge, Cambridge, UK;Microsoft Research Cambridge, Cambridge, UK
Venue:
IEEE/ACM Transactions on Networking (TON)
Year:
2010

Citing 6
Cited 1

Epidemic algorithms for replicated database maintenance

PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
A self-learning worm using importance scanning

Proceedings of the 2005 ACM workshop on Rapid malcode
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
On the performance of internet worm scanning strategies

Performance Evaluation
Planet scale software updates

Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
How dynamic are IP addresses?

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications

Characterizing internet worm infection structure

LEET'11 Proceedings of the 4th USENIX conference on Large-scale exploits and emergent threats

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider epidemic-style information dissemination strategies that leverage the nonuniformity of host distribution over subnets (e.g., IP subnets) to optimize the information spread. Such epidemic-style strategies are based on random sampling of target hosts according to a sampling rule. In this paper, we consider the metric of total number of samplings (equivalently probes) to reach a given target fraction of the host population. We first identify the minimum number of samplings needed to reach a target fraction of hosts, assuming global information about the host distribution over subnets is available. We show that this optimum can be achieved either by a dynamic strategy, for which the sampling probabilities over subnets are allowed to vary over time, or, surprisingly, even by a static strategy, for which the sampling probabilities over subnets are fixed. These results provide insights about the best achievable performance and how different system parameters affect the number of sampling needed. We then consider simple online sampling strategies that do not require any prior knowledge of the distribution of hosts over subnets, but where each host biases sampling based on its observed sampling outcomes while keeping only O(1) state at any point in time. Using real data-sets from several large-scale Internet measurements, we evaluate significance of the system parameters that determine the sampling requirements and compare the performance of our proposed distribution-oblivious sampling strategies to the theoretical bound. Our results provide insights for the design of efficient information dissemination systems, as well as for the design of countermeasures against worms that use subnet-preferential scanning.