Efficient techniques for document sanitization

  • Authors:
  • Venkatesan T. Chakaravarthy;Himanshu Gupta;Prasan Roy;Mukesh K. Mohania

  • Affiliations:
  • IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;Aster Data Systems, Redwood City, CA, USA;IBM India Research Lab, New Delhi, India

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sanitization of a document involves removing sensitive information from the document, so that it may be distributed to a broader audience. Such sanitization is needed while declassifying documents involving sensitive or confidential information such as corporate emails, intelligence reports, medical records, etc. In this paper, we present the ERASE framework for performing document sanitization in an automated manner. ERASE can be used to sanitize a document dynamically, so that different users get different views of the same document based on what they are authorized to know. We formalize the problem and present algorithms used in ERASE for finding the appropriate terms to remove from the document. Our preliminary experimental study demonstrates the efficiency and efficacy of the proposed algorithms.