Preserving privacy in data mining via importance weighting

  • Authors:
  • Charles Elkan

  • Affiliations:
  • Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA

  • Venue:
  • PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a fundamentally new approach to allowing learning algorithms to be applied to a dataset, while still keeping the records in the dataset confidential. Let D be the set of records to be kept private, and let E be a fixed set of records from a similar domain that is already public. The idea is to compute and publish a weight w(x) for each record x in E that measures how representative it is of the records in D. Data mining on E using these importance weights is then approximately equivalent to data mining directly on D. The dataset D is used by its owner to compute the weights, but not revealed in any other way.