Privacy leakage in multi-relational learning via unwanted classification models

  • Authors:
  • Hongyu Guo;Herna L. Viktor;Eric Paquet

  • Affiliations:
  • Institute for Information Technology, National Research Council of Canada;University of Ottawa;Institute for Information Technology, National Research Council of Canada, and University of Ottawa

  • Venue:
  • Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multirelational classification algorithms aim to discover patterns across multiple interlinked tables in a relational database. However, when considering a complex database schema, it becomes difficult to identify all possible relationships between attributes. This is because a database often contains a very large number of attributes which come from different interconnected tables with non-determinate (such as one-to-many) relationships. A set of seemingly harmless attributes across multiple tables, therefore, may be used to learn unwanted classification models to accurately determine confidential information, leading to data leaks. Furthermore, eliminating or distorting confidential attributes may be insufficient to prevent such data disclosure, since values may be inferred based on prior insider knowledge. This paper proposes an approach to identify such "dangerous" attribute sets. For data publishing, our method generates a ranked list of subschemas which maintain the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We demonstrate the effectiveness of our method against several databases.