Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
WebACE: a Web agent for document categorization and exploration
AGENTS '98 Proceedings of the second international conference on Autonomous agents
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A relational model of data for large shared data banks
Communications of the ACM
On the design and quantification of privacy preserving data mining algorithms
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Recovering Information from Summary Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Limiting privacy breaches in privacy preserving data mining
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On the Privacy Preserving Properties of Random Data Perturbation Techniques
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Prospects and challenges for multi-relational data mining
ACM SIGKDD Explorations Newsletter
Generalizing the notion of support
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A Hybrid Approach for Mining Maixmal Hyperclique Patterns
ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Privacy and Ownership Preserving of Outsourced Medical Data
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Secure Third Party Distribution of XML Data
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Deriving private information from randomized data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Privacy leakage in multi-relational databases via pattern based semi-supervised learning
Proceedings of the 14th ACM international conference on Information and knowledge management
IEEE Transactions on Information Theory - Part 2
Attacks on privacy and deFinetti's theorem
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Privacy-Preserving Data Publishing
Foundations and Trends in Databases
Privacy leakage in multi-relational learning via unwanted classification models
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
In multi-relational databases, a view, which is a context- and content-dependent subset of one or more tables (or other views), is often used to preserve privacy by hiding sensitive information. However, recent developments in data mining present a new challenge for database security even when traditional database security techniques, such as database access control, are employed. This paper presents a data mining framework using semi-supervised learning that demonstrates the potential for privacy leakage in multi-relational databases. Many different types of semi-supervised learning techniques, such as the K-nearest neighbor (KNN) method, can be used to demonstrate privacy leakage. However, we also introduce a new approach to semi-supervised learning, hyperclique pattern-based semi-supervised learning (HPSL), which differs from traditional semi-supervised learning approaches in that it considers the similarity among groups of objects instead of only pairs of objects. Our experimental results show that both the KNN and HPSL methods have the ability to compromise database security, although the HPSL is better at this privacy violation (has higher prediction accuracy) than the KNN method. Finally, we provide a principle for avoiding privacy leakage in multi-relational databases via semi-supervised learning and illustrate this principle with a simple preventive technique whose effectiveness is demonstrated by experiments.