Using TF-IDF to hide sensitive itemsets

Authors:
Tzung-Pei Hong;Chun-Wei Lin;Kuo-Tung Yang;Shyue-Liang Wang
Affiliations:
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan and Department of Computer Science and Engineering, National Sun Yat-sen University, ...;Innovative Information Industry Research Center (IIIRC), School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P.R. China 518055;Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan;Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan
Venue:
Applied Intelligence
Year:
2013

Citing 21
Cited 3

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Extended Boolean information retrieval

Communications of the ACM
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Hiding Association Rules by Using Confidence and Support

IHW '01 Proceedings of the 4th International Workshop on Information Hiding
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Privacy preserving frequent itemset mining

CRPIT '14 Proceedings of the IEEE international conference on Privacy, security and data mining - Volume 14
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Dare to share: Protecting sensitive knowledge with data sanitization

Decision Support Systems
Hiding collaborative recommendation association rules

Applied Intelligence
Maintenance of discovered sequential patterns for record deletion

Intelligent Data Analysis
Mining association rules using clustering

Intelligent Data Analysis
Deriving Private Information from Association Rule Mining Results

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Privacy Preserving Association Rules by Using Greedy Approach

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04
K-anonymous association rule hiding

ASIACCS '10 Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security
PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases

Applied Intelligence
Hybrid ensemble approach for classification

Applied Intelligence
A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models

Applied Intelligence

Incrementally mining high utility patterns based on pre-large concept

Applied Intelligence
Multivariate microaggregation by iterative optimization

Applied Intelligence
Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining technology helps extract usable knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this paper, we propose an algorithm called SIF-IDF for modifying original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. The above concept is used to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The proposed algorithm can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approach.