A heuristic data-sanitization approach based on TF-IDF

  • Authors:
  • Tzung-Pei Hong;Chun-Wei Lin;Kuo-Tung Yang;Shyue-Liang Wang

  • Affiliations:
  • Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan and Department of Computer Science and Engineering, National Sun Yat-sen University, ...;Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan and General Education Center, National University of Kaohsiung, Kaohsiung, Taiwan;Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan;Department of Information Management, National University of Kaohsiung, Kaohsiung, Taiwan

  • Venue:
  • IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining technology can help extract useful knowledge from large data sets. The process of data collection and data dissemination may, however, result in an inherent risk of privacy threats. In this paper, the SIF-IDF algorithm is proposed to modify original databases in order to hide sensitive itemsets. It is a greedy approach based on the concept of the Term Frequency and Inverse Document Frequency (TF-IDF) borrowed from text mining. Experimental results also show the performance of the proposed approach.