An information theoretic framework for web inference detection

Authors:
Hoi Le Thi;Reihaneh Safavi-Naini
Affiliations:
University of Calgary, Calgary, AB, Canada;University of Calgary, Calgary, AB, Canada
Venue:
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Year:
2012

Citing 21
Cited 0

Elements of information theory

Elements of information theory
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Machine Learning

Machine Learning
The inference problem: a survey

ACM SIGKDD Explorations Newsletter
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Data Level Inference Detection in Database Systems

CSFW '98 Proceedings of the 11th IEEE workshop on Computer Security Foundations
A document corpus browser for in-depth reading

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Quantifying information leakage in document redaction

Proceedings of the 1st ACM workshop on Hardcopy document processing
Learning by googling

ACM SIGKDD Explorations Newsletter
Web-assisted annotation, semantic indexing and search of television and radio news

WWW '05 Proceedings of the 14th international conference on World Wide Web
A Mathematical Theory of Communication

A Mathematical Theory of Communication
Entity quick click: rapid text copying based on automatic entity extraction

CHI '06 Extended Abstracts on Human Factors in Computing Systems
Authorship attribution with thousands of candidate authors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Web-based inference detection

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Efficient signature schemes supporting redaction, pseudonymization, and data deidentification

Proceedings of the 2008 ACM symposium on Information, computer and communications security
Detecting privacy leaks using corpus-based association rules

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Protection of Database Security via Collaborative Inference Detection

IEEE Transactions on Knowledge and Data Engineering
Sanitization's slippery slope: the design and study of a text revision assistant

Proceedings of the 5th Symposium on Usable Privacy and Security
Database security protection via inference detection

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Entity workspace: an evidence file that aids memory, inference, and reading

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document redaction is widely used to protect sensitive information in published documents. In a basic redaction system, sensitive and identifying terms are removed from the document. Web-based inference is an attack on redaction systems whereby the redacted document is linked with other publicly available documents to infer the removed parts. Web-based inference also provides an approach for detecting unwanted inferences and so constructing secure redaction systems. Previous works on web-based inference used general keyword extraction methods for document representation. We propose a systematic approach, based on information theoretic concepts and measures, to rank the words in a document for purpose of inference detection. We extend our results to the case of multiple sensitive words and propose a metric that takes into account possible relationship of the sensitive words and results in an effective and efficient inference detection system. Using a number of experiments we show that our approach, when used for document redaction, substantially reduce the number of inferences that are left in a document. We describe our approach, present the experiment results, and outline future work.