Inference control to protect sensitive information in text documents

  • Authors:
  • Chad Cumby;Rayid Ghani

  • Affiliations:
  • Accenture Technology Labs, Chicago;Accenture Technology Labs, Chicago

  • Venue:
  • ACM SIGKDD Workshop on Intelligence and Security Informatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a framework and algorithms for ensuring a specified level of privacy in text data sets. Recent work has attempted to quantify the likelihood of privacy breaches for text data. We build on these notions to provide a means of controlling such breaches, couched in a multi-class classification framework. Our framework, called Text Inference Control, gives the user fine-grained control over the level of privacy needed for sensitive concepts present in that data. Additionally, our framework is designed to respect a user-defined utility metric on the data, which our methods try to maximize while redacting. In addition to our framework and algorithms, we show encouraging results on protecting the sensitive category while maximizing the preservation of the utility category on multiple data sets, against both automated attackers and human subjects.