Privacy in data mining using formal methods

  • Authors:
  • Stan Matwin;Amy Felty;István Hernádvölgyi;Venanzio Capretta

  • Affiliations:
  • SITE, University of Ottawa, Canada;SITE, University of Ottawa, Canada;Siemens PSE, Hungary;Department of Mathematics and Statistics, University of Ottawa, Canada

  • Venue:
  • TLCA'05 Proceedings of the 7th international conference on Typed Lambda Calculi and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is growing public concern about personal data collected by both private and public sectors. People have very little control over what kinds of data are stored and how such data is used. Moreover, the ability to infer new knowledge from existing data is increasing rapidly with advances in database and data mining technologies. We describe a solution which allows people to take control by specifying constraints on the ways in which their data can be used. User constraints are represented in formal logic, and organizations that want to use this data provide formal proofs that the software they use to process data meets these constraints. Checking the proof by an independent verifier demonstrates that user constraints are (or are not) respected by this software. Our notion of “privacy correctness” differs from general software correctness in two ways. First, properties of interest are simpler and thus their proofs should be easier to automate. Second, this kind of correctness is stricter; in addition to showing a certain relation between input and output is realized, we must also show that only operations that respect privacy constraints are applied during execution. We have therefore an intensional notion of correctness, rather that the usual extensional one. We discuss how our mechanism can be put into practice, and we present the technical aspects via an example. Our example shows how users can exercise control when their data is to be used as input to a decision tree learning algorithm. We have formalized the example and the proof of preservation of privacy constraints in Coq.