Privacy measures for free text documents: bridging the gap between theory and practice

Authors:
Liqiang Geng;Yonghua You;Yunli Wang;Hongyu Liu
Affiliations:
National Research Council of Canada, Institute for Information Technology, Fredericton, Canada;National Research Council of Canada, Institute for Information Technology, Fredericton, Canada;National Research Council of Canada, Institute for Information Technology, Fredericton, Canada;National Research Council of Canada, Institute for Information Technology, Fredericton, Canada
Venue:
TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business
Year:
2011

Citing 9
Cited 1

k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Detecting privacy and ethical sensitivity in data mining results

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Revisiting the uniqueness of simple demographics in the US population

Proceedings of the 5th ACM workshop on Privacy in electronic society
Web-based inference detection

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Detecting privacy leaks using corpus-based association rules

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Private Data Discovery for Privacy Compliance in Collaborative Environments

CDVE '08 Proceedings of the 5th international conference on Cooperative Design, Visualization, and Engineering
Automatic Detecting Documents Containing Personal Health Information

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Privacy-enhanced web personalization

The adaptive web

de-linkability: a privacy-preserving constraint for safely outsourcing multimedia documents

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Privacy compliance for free text documents is a challenge facing many organizations. Named entity recognition techniques and machine learning methods can be used to detect private information, such as personally identifiable information (PII) and personal health information (PHI) in free text documents. However, these methods cannot measure the level of privacy embodied in the documents. In this paper, we propose a framework to measure the privacy content in free text documents. The measure consists of two factors: the probability that the text can be used to uniquely identify a person and the degree of sensitivity of the private entities associated with the person. We then instantiate the framework in the scenario of detection and protection of PHI in medical records, which is a challenge for many hospitals, clinics, and other medical institutions. We did experiments on a real dataset to show the effectiveness of the proposed measure.