Scanning electronic documents for personally identifiable information

  • Authors:
  • Tuomas Aura;Thomas A. Kuhn;Michael Roe

  • Affiliations:
  • Microsoft Research, Cambridge, UK;Technische Universitãt München Munich, Germany;Microsoft Research, Cambridge, UK

  • Venue:
  • Proceedings of the 5th ACM workshop on Privacy in electronic society
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sometimes, it is necessary to remove author names and other personally identifiable information (PII) from documents before publication. We have implemented a novel defensive tool for detecting such data automatically. By using the detection tool, we have learned about where PII may be stored in documents and how it is put there. A key observation is that, contrary to common belief, user and machine identifiers and other metadata are not embedded in documents only by a single piece of software, such as a word processor, but by various tools used at different stages of the document authoring process.