MICF: An effective sanitization algorithm for hiding sensitive patterns on data mining

Authors:
Yu-Chiang Li;Jieh-Shan Yeh;Chin-Chen Chang
Affiliations:
Department of Computer Science and Information Engineering, National Chung Cheng University, 168 University Road, San-Hsing, Min-Hsiung, Chiayi 62102, Taiwan, ROC;Department of Computer Science and Information Management, Providence University, 200 Chung-Chi, Shalu, Taichung 433, Taiwan, ROC;Department of Computer Science and Information Engineering, National Chung Cheng University, 168 University Road, San-Hsing, Min-Hsiung, Chiayi 62102, Taiwan, ROC and Department of Information Eng ...
Venue:
Advanced Engineering Informatics
Year:
2007

Citing 21
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Discovering data mining: from concept to implementation

Discovering data mining: from concept to implementation
Using sample size to limit exposure to data mining

Journal of Computer Security - Special issue on database security
On the design and quantification of privacy preserving data mining algorithms

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining for design and manufacturing: methods and applications

Data mining for design and manufacturing: methods and applications
Data Mining: Concepts, Models, Methods and Algorithms

Data Mining: Concepts, Models, Methods and Algorithms
Using unknowns to prevent discovery of association rules

ACM SIGMOD Record
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Impact of Decision-Region Based Classification Mining Algorithms on Database Security

Proceedings of the IFIP WG 11.3 Thirteenth International Conference on Database Security: Research Advances in Database and Information Systems Security
Hiding Association Rules by Using Confidence and Support

IHW '01 Proceedings of the 4th International Workshop on Information Hiding
Privacy preserving mining of association rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Privacy preserving frequent itemset mining

CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Privacy Preserving Association Rule Mining

RIDE '02 Proceedings of the 12th International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems (RIDE'02)
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
An information value based approach to design procedure capture

Advanced Engineering Informatics

Effective sanitization approaches to hide sensitive utility and frequent itemsets

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining mechanisms have widely been applied in various businesses and manufacturing companies across many industry sectors. Sharing data or sharing mined rules has become a trend among business partnerships, as it is perceived to be a mutually benefit way of increasing productivity for all parties involved. Nevertheless, this has also increased the risk of unexpected information leaks when releasing data. To conceal restrictive itemsets (patterns) contained in the source database, a sanitization process transforms the source database into a released database that the counterpart cannot extract sensitive rules from. The transformed result also conceals non-restrictive information as an unwanted event, called a side effect or the ''misses cost''. The problem of finding an optimal sanitization method, which conceals all restrictive itemsets but minimizes the misses cost, is NP-hard. To address this challenging problem, this study proposes the maximum item conflict first (MICF) algorithm. Experimental results demonstrate that the proposed method is effective, has a low sanitization rate, and can generally achieve a significantly lower misses cost than those achieved by the MinFIA, MaxFIA, IGA and Algo2b methods in several real and artificial datasets.