Inference control to protect sensitive information in text documents

Authors:
Chad Cumby;Rayid Ghani
Affiliations:
Accenture Technology Labs, Chicago;Accenture Technology Labs, Chicago
Venue:
ACM SIGKDD Workshop on Intelligence and Security Informatics
Year:
2010

Citing 14
Cited 0

Feature Subset Selection in Text-Learning

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Adversarial classification

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
"I know what you did last summer": query logs and user privacy

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient signature schemes supporting redaction, pseudonymization, and data deidentification

Proceedings of the 2008 ACM symposium on Information, computer and communications security
Detecting privacy leaks using corpus-based association rules

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving anonymization of set-valued data

Proceedings of the VLDB Endowment
Efficient techniques for document sanitization

Proceedings of the 17th ACM conference on Information and knowledge management
Vanity fair: privacy in querylog bundles

Proceedings of the 17th ACM conference on Information and knowledge management
Sanitization's slippery slope: the design and study of a text revision assistant

Proceedings of the 5th Symposium on Usable Privacy and Security
Differential privacy

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a framework and algorithms for ensuring a specified level of privacy in text data sets. Recent work has attempted to quantify the likelihood of privacy breaches for text data. We build on these notions to provide a means of controlling such breaches, couched in a multi-class classification framework. Our framework, called Text Inference Control, gives the user fine-grained control over the level of privacy needed for sensitive concepts present in that data. Additionally, our framework is designed to respect a user-defined utility metric on the data, which our methods try to maximize while redacting. In addition to our framework and algorithms, we show encouraging results on protecting the sensitive category while maximizing the preservation of the utility category on multiple data sets, against both automated attackers and human subjects.