Elements of information theory
Elements of information theory
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Machine Learning
The inference problem: a survey
ACM SIGKDD Explorations Newsletter
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Data Level Inference Detection in Database Systems
CSFW '98 Proceedings of the 11th IEEE workshop on Computer Security Foundations
A document corpus browser for in-depth reading
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Quantifying information leakage in document redaction
Proceedings of the 1st ACM workshop on Hardcopy document processing
ACM SIGKDD Explorations Newsletter
Web-assisted annotation, semantic indexing and search of television and radio news
WWW '05 Proceedings of the 14th international conference on World Wide Web
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Entity quick click: rapid text copying based on automatic entity extraction
CHI '06 Extended Abstracts on Human Factors in Computing Systems
Authorship attribution with thousands of candidate authors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Efficient signature schemes supporting redaction, pseudonymization, and data deidentification
Proceedings of the 2008 ACM symposium on Information, computer and communications security
Detecting privacy leaks using corpus-based association rules
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Protection of Database Security via Collaborative Inference Detection
IEEE Transactions on Knowledge and Data Engineering
Sanitization's slippery slope: the design and study of a text revision assistant
Proceedings of the 5th Symposium on Usable Privacy and Security
Database security protection via inference detection
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Entity workspace: an evidence file that aids memory, inference, and reading
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Hi-index | 0.00 |
Document redaction is widely used to protect sensitive information in published documents. In a basic redaction system, sensitive and identifying terms are removed from the document. Web-based inference is an attack on redaction systems whereby the redacted document is linked with other publicly available documents to infer the removed parts. Web-based inference also provides an approach for detecting unwanted inferences and so constructing secure redaction systems. Previous works on web-based inference used general keyword extraction methods for document representation. We propose a systematic approach, based on information theoretic concepts and measures, to rank the words in a document for purpose of inference detection. We extend our results to the case of multiple sensitive words and propose a metric that takes into account possible relationship of the sensitive words and results in an effective and efficient inference detection system. Using a number of experiments we show that our approach, when used for document redaction, substantially reduce the number of inferences that are left in a document. We describe our approach, present the experiment results, and outline future work.