Minimizing Information Loss and Preserving Privacy

Authors:
Syam Menon;Sumit Sarkar
Affiliations:
School of Management, The University of Texas at Dallas, Richardson, Texas 75083;School of Management, The University of Texas at Dallas, Richardson, Texas 75083
Venue:
Management Science
Year:
2007

Citing 13
Cited 4

Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators

Management Science
Using conjunction of attribute values for classification

Proceedings of the eleventh international conference on Information and knowledge management
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The Security of Confidential Numerical Data in Databases

Information Systems Research
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Privacy preserving frequent itemset mining

CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat

Management Science
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
A local branching heuristic for mixed-integer programs with 2-level variables, with an application to a telecommunication network design problem

Networks
GHIC: A Hierarchical Pattern-Based Clustering Algorithm for Grouping Web Transactions

IEEE Transactions on Knowledge and Data Engineering
Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns

Information Systems Research

Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining

Decision Support Systems
Privacy-preserving similarity-based text retrieval

ACM Transactions on Internet Technology (TOIT)
A privacy protection technique for publishing data mining models and research data

ACM Transactions on Management Information Systems (TMIS)
On the Prevention of Fraud and Privacy Exposure in Process Information Flow

INFORMS Journal on Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The need to hide sensitive information before sharing databases has long been recognized. In the context of data mining, sensitive information often takes the form of itemsets that need to be suppressed before the data is released. This paper considers the problem of minimizing the number of nonsensitive itemsets lost while concealing sensitive ones. It is shown to be an intractably large version of an NP-hard problem. Consequently, a two-phased procedure that involves the solution of two smaller NP-hard problems is proposed as a practical and effective alternative. In the first phase, a procedure to solve a sanitization problem identifies how the support for sensitive itemsets could be eliminated from a specific transaction by removing the fewest number of items from it. This leads to a modified frequent itemset hiding problem, where transactions to be sanitized are selected such that the number of nonsensitive itemsets lost, while concealing sensitive ones, is minimized. Heuristic procedures are developed for these problems using intuition derived from their integer programming formulations. Results from computational experiments conducted on a publicly available retail data set and three large data sets generated using IBM's synthetic data generator indicate that these approaches are very effective, solving problems involving up to 10 million transactions in a short period of time. The results also show that the process of sanitization has considerable bearing on the quality of solutions obtained.