Privacy leakage in multi-relational learning via unwanted classification models

Authors:
Hongyu Guo;Herna L. Viktor;Eric Paquet
Affiliations:
Institute for Information Technology, National Research Council of Canada;University of Ottawa;Institute for Information Technology, National Research Council of Canada, and University of Ottawa
Venue:
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Year:
2011

Citing 27
Cited 0

Numerical recipes in C: the art of scientific computing

Numerical recipes in C: the art of scientific computing
C4.5: programs for machine learning

C4.5: programs for machine learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Privacy preserving association rule mining in vertically partitioned data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-relational data mining: an introduction

ACM SIGKDD Explorations Newsletter
Introduction to the Special Issue on Meta-Learning

Machine Learning
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Checking for k-anonymity violation by views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining relational data through correlation-based multiple view validation

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy leakage in multi-relational databases: a semi-supervised learning perspective

The VLDB Journal — The International Journal on Very Large Data Bases
Pruning Relations for Substructure Discovery of Multi-relational Databases

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Multirelational classification: a multiple view approach

Knowledge and Information Systems
An overview of privacy preserving data mining

Crossroads
Editorial: Recent progress in database privacy

Data & Knowledge Engineering
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Top-down induction of first-order logical decision trees

Artificial Intelligence
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Identifying and Preventing Data Leakage in Multi-relational Classification

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multirelational classification algorithms aim to discover patterns across multiple interlinked tables in a relational database. However, when considering a complex database schema, it becomes difficult to identify all possible relationships between attributes. This is because a database often contains a very large number of attributes which come from different interconnected tables with non-determinate (such as one-to-many) relationships. A set of seemingly harmless attributes across multiple tables, therefore, may be used to learn unwanted classification models to accurately determine confidential information, leading to data leaks. Furthermore, eliminating or distorting confidential attributes may be insufficient to prevent such data disclosure, since values may be inferred based on prior insider knowledge. This paper proposes an approach to identify such "dangerous" attribute sets. For data publishing, our method generates a ranked list of subschemas which maintain the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We demonstrate the effectiveness of our method against several databases.