Privacy measures for free text documents: bridging the gap between theory and practice

  • Authors:
  • Liqiang Geng;Yonghua You;Yunli Wang;Hongyu Liu

  • Affiliations:
  • National Research Council of Canada, Institute for Information Technology, Fredericton, Canada;National Research Council of Canada, Institute for Information Technology, Fredericton, Canada;National Research Council of Canada, Institute for Information Technology, Fredericton, Canada;National Research Council of Canada, Institute for Information Technology, Fredericton, Canada

  • Venue:
  • TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Privacy compliance for free text documents is a challenge facing many organizations. Named entity recognition techniques and machine learning methods can be used to detect private information, such as personally identifiable information (PII) and personal health information (PHI) in free text documents. However, these methods cannot measure the level of privacy embodied in the documents. In this paper, we propose a framework to measure the privacy content in free text documents. The measure consists of two factors: the probability that the text can be used to uniquely identify a person and the degree of sensitivity of the private entities associated with the person. We then instantiate the framework in the scenario of detection and protection of PHI in medical records, which is a challenge for many hospitals, clinics, and other medical institutions. We did experiments on a real dataset to show the effectiveness of the proposed measure.