Using TF-IDF to hide sensitive itemsets

  • Authors:
  • Tzung-Pei Hong;Chun-Wei Lin;Kuo-Tung Yang;Shyue-Liang Wang

  • Affiliations:
  • Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan and Department of Computer Science and Engineering, National Sun Yat-sen University, ...;Innovative Information Industry Research Center (IIIRC), School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P.R. China 518055;Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan;Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan

  • Venue:
  • Applied Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining technology helps extract usable knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. Some sensitive or private information about individuals, businesses and organizations needs to be suppressed before it is shared or published. The privacy-preserving data mining (PPDM) has thus become an important issue in recent years. In this paper, we propose an algorithm called SIF-IDF for modifying original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept borrowed from the Term Frequency and Inverse Document Frequency (TF-IDF) in text mining. The above concept is used to evaluate the similarity degrees between the items in transactions and the desired sensitive itemsets and then selects appropriate items in some transactions to hide. The proposed algorithm can easily make good trade-offs between privacy preserving and execution time. Experimental results also show the performance of the proposed approach.