Secure distributed data-mining and its application to large-scale network measurements

  • Authors:
  • Matthew Roughan;Yin Zhang

  • Affiliations:
  • University of Adelaide, Australia;University of Texas at Austin, Austin, TX

  • Venue:
  • ACM SIGCOMM Computer Communication Review
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid growth of the Internet over the last decade has been startling. However, efforts to track its growth have often fallen afoul of bad data --- for instance, how much traffic does the Internet now carry? The problem is not that the data is technically hard to obtain, or that it does not exist, but rather that the data is not shared. Obtaining an overall picture requires data from multiple sources, few of whom are open to sharing such data, either because it violates privacy legislation, or exposes business secrets. Likewise, detection of global Internet health problems is hampered by a lack of data sharing. The approaches used so far in the Internet, e.g. trusted third parties, or data anonymization, have been only partially successful, and are not widely adopted.The paper presents a method for performing computations on shared data without any participants revealing their secret data. For example, one can compute the sum of traffic over a set of service providers without any service provider learning the traffic of another. The method is simple, scalable, and flexible enough to perform a wide range of valuable operations on Internet data.