Efficient techniques for document sanitization

Authors:
Venkatesan T. Chakaravarthy;Himanshu Gupta;Prasan Roy;Mukesh K. Mohania
Affiliations:
IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;Aster Data Systems, Redwood City, CA, USA;IBM India Research Lab, New Delhi, India
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 10
Cited 8

Snowball: a prototype system for extracting relations from large text collections

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Incognito: efficient full-domain K-anonymity

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Information Extraction: Distilling Structured Data from Unstructured Text

Queue - Social Computing
Efficient Batch Top-k Search for Dictionary-based Entity Recognition

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Linear degree extractors and the inapproximability of max clique and chromatic number

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Association mining

ACM Computing Surveys (CSUR)
Efficiently linking text documents with relevant structured information

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

Sanitization's slippery slope: the design and study of a text revision assistant

Proceedings of the 5th Symposium on Usable Privacy and Security
Privacy-preserving data publishing: A survey of recent developments

ACM Computing Surveys (CSUR)
Anonymizing data with quasi-sensitive attribute values

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Inference control to protect sensitive information in text documents

ACM SIGKDD Workshop on Intelligence and Security Informatics
Document sanitization: measuring search engine information loss and risk of disclosure for the wikileaks cables

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
t-Plausibility: Generalizing Words to Desensitize Text

Transactions on Data Privacy
Detecting sensitive information from textual documents: an information-theoretic approach

MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
de-linkability: a privacy-preserving constraint for safely outsourcing multimedia documents

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sanitization of a document involves removing sensitive information from the document, so that it may be distributed to a broader audience. Such sanitization is needed while declassifying documents involving sensitive or confidential information such as corporate emails, intelligence reports, medical records, etc. In this paper, we present the ERASE framework for performing document sanitization in an automated manner. ERASE can be used to sanitize a document dynamically, so that different users get different views of the same document based on what they are authorized to know. We formalize the problem and present algorithms used in ERASE for finding the appropriate terms to remove from the document. Our preliminary experimental study demonstrates the efficiency and efficacy of the proposed algorithms.