A heuristic data-sanitization approach based on TF-IDF

Authors:
Tzung-Pei Hong;Chun-Wei Lin;Kuo-Tung Yang;Shyue-Liang Wang
Affiliations:
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan and Department of Computer Science and Engineering, National Sun Yat-sen University, ...;Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan and General Education Center, National University of Kaohsiung, Kaohsiung, Taiwan;Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan;Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan
Venue:
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Year:
2011

Citing 9
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Extended Boolean information retrieval

Communications of the ACM
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Hiding Association Rules by Using Confidence and Support

IHW '01 Proceedings of the 4th International Workshop on Information Hiding
Disclosure Limitation of Sensitive Rules

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Association Rule Hiding

IEEE Transactions on Knowledge and Data Engineering
Dare to share: Protecting sensitive knowledge with data sanitization

Decision Support Systems
Deriving Private Information from Association Rule Mining Results

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Document sanitization: measuring search engine information loss and risk of disclosure for the wikileaks cables

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining technology can help extract useful knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. In this paper, the SIF-IDF algorithm is proposed to modify original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept of the Term Frequency and Inverse Document Frequency (TF-IDF) borrowed from text mining. Experimental results also show the performance of the proposed approach.