Privacy leakage in multi-relational databases: a semi-supervised learning perspective

Authors:
Hui Xiong;Michael Steinbach;Vipin Kumar
Affiliations:
MSIS Department, Rutgers University, USA;Department of Computer Science and Engineering, University of Minnesota, USA;Department of Computer Science and Engineering, University of Minnesota, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2006

Citing 21
Cited 3

Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
WebACE: a Web agent for document categorization and exploration

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A relational model of data for large shared data banks

Communications of the ACM
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Recovering Information from Summary Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Limiting privacy breaches in privacy preserving data mining

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Technological Solutions for Protecting Privacy

Computer
Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On the Privacy Preserving Properties of Random Data Perturbation Techniques

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Prospects and challenges for multi-relational data mining

ACM SIGKDD Explorations Newsletter
Generalizing the notion of support

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Hybrid Approach for Mining Maixmal Hyperclique Patterns

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Privacy and Ownership Preserving of Outsourced Medical Data

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Secure Third Party Distribution of XML Data

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Privacy leakage in multi-relational databases via pattern based semi-supervised learning

Proceedings of the 14th ACM international conference on Information and knowledge management
The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter

IEEE Transactions on Information Theory - Part 2

Attacks on privacy and deFinetti's theorem

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Privacy leakage in multi-relational learning via unwanted classification models

Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In multi-relational databases, a view, which is a context- and content-dependent subset of one or more tables (or other views), is often used to preserve privacy by hiding sensitive information. However, recent developments in data mining present a new challenge for database security even when traditional database security techniques, such as database access control, are employed. This paper presents a data mining framework using semi-supervised learning that demonstrates the potential for privacy leakage in multi-relational databases. Many different types of semi-supervised learning techniques, such as the K-nearest neighbor (KNN) method, can be used to demonstrate privacy leakage. However, we also introduce a new approach to semi-supervised learning, hyperclique pattern-based semi-supervised learning (HPSL), which differs from traditional semi-supervised learning approaches in that it considers the similarity among groups of objects instead of only pairs of objects. Our experimental results show that both the KNN and HPSL methods have the ability to compromise database security, although the HPSL is better at this privacy violation (has higher prediction accuracy) than the KNN method. Finally, we provide a principle for avoiding privacy leakage in multi-relational databases via semi-supervised learning and illustrate this principle with a simple preventive technique whose effectiveness is demonstrated by experiments.