Sanitization's slippery slope: the design and study of a text revision assistant

Authors:
Richard Chow;Ian Oberst;Jessica Staddon
Affiliations:
PARC;Oregon State University;PARC
Venue:
Proceedings of the 5th Symposium on Usable Privacy and Security
Year:
2009

Citing 5
Cited 2

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Evaluating interfaces for privacy policy rule authoring

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Detecting privacy leaks using corpus-based association rules

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient techniques for document sanitization

Proceedings of the 17th ACM conference on Information and knowledge management

Inference control to protect sensitive information in text documents

ACM SIGKDD Workshop on Intelligence and Security Informatics
An information theoretic framework for web inference detection

Proceedings of the 5th ACM workshop on Security and artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

For privacy reasons, sensitive content may be revised before it is released. The revision often consists of redaction, that is, the "blacking out" of sensitive words and phrases. Redaction has the side effect of reducing the utility of the content, often so much that the content is no longer useful. Consequently, government agencies and others are increasingly exploring the revision of sensitive content as an alternative to redaction that preserves more content utility. We call this practice sanitization. In a sanitized document, names might be replaced with pseudonyms and sensitive attributes might be replaced with hypernyms. Sanitization adds to redaction the challenge of determining what words and phrases reduce the sensitivity of content. We have designed and developed a tool to assist users in sanitizing sensitive content. Our tool leverages the Web to automatically identify sensitive words and phrases and quickly evaluates revisions for sensitivity. The tool, however, does not identify all sensitive terms and mistakenly marks some innocuous terms as sensitive. This is unavoidable because of the difficulty of the underlying inference problem and is the main reason we have designed a sanitization assistant as opposed to a fully-automated tool. We have conducted a small study of our tool in which users sanitize biographies of celebrities to hide the celebrity's identity both both with and without our tool. The user study suggests that while the tool is very valuable in encouraging users to preserve content utility and can preserve privacy, this usefulness and apparent authoritativeness may lead to a "slippery slope" in which users neglect their own judgment in favor of the tool's.