Privacy-preserving data mining: A feature set partitioning approach

Authors:
Nissim Matatov;Lior Rokach;Oded Maimon
Affiliations:
Department of Industrial Engineering, Tel-Aviv University, Israel;Department of Information System Engineering, Ben Gurion University of the Negev, Be'er Sheva 84105, Israel and Duetsche Telekom Laboratories at Ben Gurion University of the Negev, Be'er Sheva 841 ...;Department of Industrial Engineering, Tel-Aviv University, Israel
Venue:
Information Sciences: an International Journal
Year:
2010

Citing 51
Cited 12

The role of domain knowledge in data mining

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
An introduction to genetic algorithms

An introduction to genetic algorithms
Towards tractable algebras for bags

Journal of Computer and System Sciences
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Generalizing data to provide anonymity when disclosing information (abstract)

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining by attribute decomposition with semiconductor manufacturing case study

Data mining for design and manufacturing
An extended relational algebra with control over duplicate elimination

PODS '82 Proceedings of the 1st ACM SIGACT-SIGMOD symposium on Principles of database systems
Efficient GA Based Techniques for Classification

Applied Intelligence
Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization

Proceedings of the 5th International Conference on Genetic Algorithms
Theory and Applications of Attribute Decomposition

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Improving Supervised Learning by Feature Decomposition

FoIKS '02 Proceedings of the Second International Symposium on Foundations of Information and Knowledge Systems
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
State-of-the-art in privacy preserving data mining

ACM SIGMOD Record
When do data mining results violate privacy?

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Checking for k-anonymity violation by views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A summary-attainment-surface plotting method for visualizing the performance of stochastic multiobjective optimizers

ISDA '05 Proceedings of the 5th International Conference on Intelligent Systems Design and Applications
Comparison of Multiobjective Evolutionary Algorithms: Empirical Results

Evolutionary Computation
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Decomposition methodology for classification tasks: a meta decomposer framework

Pattern Analysis & Applications
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature set decomposition for decision trees

Intelligent Data Analysis
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Optimal k-Anonymity with Flexible Generalization Schemes through Bottom-up Searching

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Decision-tree instance-space decomposition with grouped gain-ratio

Information Sciences: an International Journal
Two methods for privacy preserving data mining with malicious participants

Information Sciences: an International Journal
Thoughts on k-anonymization

Data & Knowledge Engineering
Domain-Driven, Actionable Knowledge Discovery

IEEE Intelligent Systems
Privacy preserving data mining of sequential patterns for network traffic data

Information Sciences: an International Journal
Genetic algorithm-based feature set partitioning for classification problems

Pattern Recognition
A genetic algorithm calibration method based on convergence due to genetic drift

Information Sciences: an International Journal
Providing k-anonymity in data mining

The VLDB Journal — The International Journal on Very Large Data Bases
Self-organizing genetic algorithm based tuning of PID controllers

Information Sciences: an International Journal
Getting insights from the voices of customers: Conversation mining at a contact center

Information Sciences: an International Journal
k-Anonymous data collection

Information Sciences: an International Journal
Troika - An improved stacking schema for classification tasks

Information Sciences: an International Journal
Efficient Multidimensional Suppression for K-Anonymity

IEEE Transactions on Knowledge and Data Engineering
k-Anonymous Decision Tree Induction

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
k-anonymous patterns

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Multiobjective evolutionary algorithms: a comparative case studyand the strength Pareto approach

IEEE Transactions on Evolutionary Computation

Limiting disclosure of sensitive data in sequential releases of databases

Information Sciences: an International Journal
A modification of the Lloyd algorithm for k-anonymous quantization

Information Sciences: an International Journal
The CASH algorithm-cost-sensitive attribute selection using histograms

Information Sciences: an International Journal
Algorithmic superactivation of asymptotic quantum capacity of zero-capacity quantum channels

Information Sciences: an International Journal
Customer relationship management using partial focus feature reduction

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV
Editorial: Guest editorial: Special issue on data mining for information security

Information Sciences: an International Journal
Privacy-preserving disjunctive normal form operations on distributed sets

Information Sciences: an International Journal
Privacy-preserving trajectory data publishing by local suppression

Information Sciences: an International Journal
Evaluation of a perturbation-based technique for privacy preservation in a multi-party clustering scenario

Information Sciences: an International Journal
Application traffic classification at the early stage by characterizing application rounds

Information Sciences: an International Journal
Improving accuracy of classification models induced from anonymized datasets

Information Sciences: an International Journal
Reversible privacy preserving data mining: a combination of difference expansion and privacy preserving

The Journal of Supercomputing

Quantified Score

Hi-index	0.07

Visualization

Abstract

In privacy-preserving data mining (PPDM), a widely used method for achieving data mining goals while preserving privacy is based on k-anonymity. This method, which protects subject-specific sensitive data by anonymizing it before it is released for data mining, demands that every tuple in the released table should be indistinguishable from no fewer than k subjects. The most common approach for achieving compliance with k-anonymity is to replace certain values with less specific but semantically consistent values. In this paper we propose a different approach for achieving k-anonymity by partitioning the original dataset into several projections such that each one of them adheres to k-anonymity. Moreover, any attempt to rejoin the projections, results in a table that still complies with k-anonymity. A classifier is trained on each projection and subsequently, an unlabelled instance is classified by combining the classifications of all classifiers. Guided by classification accuracy and k-anonymity constraints, the proposed data mining privacy by decomposition (DMPD) algorithm uses a genetic algorithm to search for optimal feature set partitioning. Ten separate datasets were evaluated with DMPD in order to compare its classification performance with other k-anonymity-based methods. The results suggest that DMPD performs better than existing k-anonymity-based algorithms and there is no necessity for applying domain dependent knowledge. Using multiobjective optimization methods, we also examine the tradeoff between the two conflicting objectives in PPDM: privacy and predictive performance.