Numerical recipes in C: the art of scientific computing
Numerical recipes in C: the art of scientific computing
C4.5: programs for machine learning
C4.5: programs for machine learning
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
On the design and quantification of privacy preserving data mining algorithms
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Protecting Respondents' Identities in Microdata Release
IEEE Transactions on Knowledge and Data Engineering
Privacy preserving association rule mining in vertically partitioned data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-relational data mining: an introduction
ACM SIGKDD Explorations Newsletter
Introduction to the Special Issue on Meta-Learning
Machine Learning
IEEE Transactions on Knowledge and Data Engineering
CrossMine: Efficient Classification Across Multiple Database Relations
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Checking for k-anonymity violation by views
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining relational data through correlation-based multiple view validation
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy leakage in multi-relational databases: a semi-supervised learning perspective
The VLDB Journal — The International Journal on Very Large Data Bases
Pruning Relations for Substructure Discovery of Multi-relational Databases
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Multirelational classification: a multiple view approach
Knowledge and Information Systems
Editorial: Recent progress in database privacy
Data & Knowledge Engineering
Privacy-Preserving Data Publishing
Foundations and Trends in Databases
Top-down induction of first-order logical decision trees
Artificial Intelligence
Privacy-preserving data publishing: A survey of recent developments
ACM Computing Surveys (CSUR)
Identifying and Preventing Data Leakage in Multi-relational Classification
ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Hi-index | 0.00 |
Multirelational classification algorithms aim to discover patterns across multiple interlinked tables in a relational database. However, when considering a complex database schema, it becomes difficult to identify all possible relationships between attributes. This is because a database often contains a very large number of attributes which come from different interconnected tables with non-determinate (such as one-to-many) relationships. A set of seemingly harmless attributes across multiple tables, therefore, may be used to learn unwanted classification models to accurately determine confidential information, leading to data leaks. Furthermore, eliminating or distorting confidential attributes may be insufficient to prevent such data disclosure, since values may be inferred based on prior insider knowledge. This paper proposes an approach to identify such "dangerous" attribute sets. For data publishing, our method generates a ranked list of subschemas which maintain the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We demonstrate the effectiveness of our method against several databases.