Anonymizing classification data using rough set theory

Authors:
Mingquan Ye;Xindong Wu;Xuegang Hu;Donghui Hu
Affiliations:
Department of Computer Science, Hefei University of Technology, Hefei 230009, PR China and Department of Computer Science, Wannan Medical College, Wuhu 241002, PR China;Department of Computer Science, Hefei University of Technology, Hefei 230009, PR China and Department of Computer Science, University of Vermont, Burlington, VT 05405, USA;Department of Computer Science, Hefei University of Technology, Hefei 230009, PR China;Department of Computer Science, Hefei University of Technology, Hefei 230009, PR China
Venue:
Knowledge-Based Systems
Year:
2013

Citing 29
Cited 2

Rough computational methods for information systems

Artificial Intelligence
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Template-Based Privacy Preservation in Classification Problems

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data

Knowledge and Information Systems
Utility-based anonymization using local recoding

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
L-diversity: Privacy beyond k-anonymity

ACM Transactions on Knowledge Discovery from Data (TKDD)
A rough sets based characteristic relation approach for dynamic attribute generalization in data mining

Knowledge-Based Systems
Learning cross-level certain and possible rules by rough sets

Expert Systems with Applications: An International Journal
Providing k-anonymity in data mining

The VLDB Journal — The International Journal on Very Large Data Bases
Workload-aware anonymization techniques for large-scale datasets

ACM Transactions on Database Systems (TODS)
Attribute Value Taxonomy Generation through Matrix Based Adaptive Genetic Algorithm

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
Publishing Sensitive Transactions for Itemset Utility

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Privacy-preserving data publishing for cluster analysis

Data & Knowledge Engineering
On the tradeoff between privacy and utility in data publishing

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical decision rules mining

Expert Systems with Applications: An International Journal
MGRS: A multi-granulation rough set

Information Sciences: an International Journal
Efficient Multidimensional Suppression for K-Anonymity

IEEE Transactions on Knowledge and Data Engineering
Positive approximation: An accelerator for attribute reduction in rough set theory

Artificial Intelligence
Data mining with differential privacy

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A vague-rough set approach for uncertain knowledge acquisition

Knowledge-Based Systems
The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks

Knowledge and Information Systems
Geometric data perturbation for privacy preserving outsourced data mining

Knowledge and Information Systems
A high-performing comprehensive learning algorithm for text classification without pre-labeled training set

Knowledge and Information Systems
Obtaining scalable and accurate classification in large-scale spatio-temporal domains

Knowledge and Information Systems
Clustering-oriented privacy-preserving data publishing

Knowledge-Based Systems

Knowledge reduction for decision tables with attribute value taxonomies

Knowledge-Based Systems
Multi-level rough set reduction for decision rule mining

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identity disclosure is one of the most serious privacy concerns in many data mining applications. A well-known privacy model for protecting identity disclosure is k-anonymity. The main goal of anonymizing classification data is to protect individual privacy while maintaining the utility of the data in building classification models. In this paper, we present an approach based on rough sets for measuring the data quality and guiding the process of anonymization operations. First, we make use of the attribute reduction theory of rough sets and introduce the conditional entropy to measure the classification data quality of anonymized datasets. Then, we extend conditional entropy under single-level granulation to hierarchical conditional entropy under multi-level granulation, and study its properties by dynamically coarsening and refining attribute values. Guided by these properties, we develop an efficient search metric and present a novel algorithm for achieving k-anonymity, Hierarchical Conditional Entropy-based Top-Down Refinement (HCE-TDR), which combines rough set theory and attribute value taxonomies. Theoretical analysis and experiments on real world datasets show that our algorithm is efficient and improves data utility.