Detecting privacy leaks using corpus-based association rules

Authors:
Richard Chow;Philippe Golle;Jessica Staddon
Affiliations:
Palo Alto Research Center, Palo Alto, CA, USA;Palo Alto Research Center, Palo Alto, CA, USA;Palo Alto Research Center, Palo Alto, CA, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 13
Cited 9

Fast discovery of association rules

Advances in knowledge discovery and data mining
Generating association rules from semi-structured documents using an extended concept hierarchy

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Natural language processing and knowledge representation: language for knowledge and knowledge for language

Natural language processing and knowledge representation: language for knowledge and knowledge for language
Community search assistant

Proceedings of the 6th international conference on Intelligent user interfaces
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The inference problem: a survey

ACM SIGKDD Explorations Newsletter
Pushing Support Constraints Into Association Rules Mining

IEEE Transactions on Knowledge and Data Engineering
Learning by googling

ACM SIGKDD Explorations Newsletter
Web-assisted annotation, semantic indexing and search of television and radio news

WWW '05 Proceedings of the 14th international conference on World Wide Web
Mining generalized association rules on biomedical literature

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Using the web as an implicit training set: application to structural ambiguity resolution

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Web-based inference detection

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium

Detecting reviewer bias through web-based association mining

Proceedings of the 2nd ACM workshop on Information credibility on the web
Sanitization's slippery slope: the design and study of a text revision assistant

Proceedings of the 5th Symposium on Usable Privacy and Security
Protecting Sensitive Topics in Text Documents with PROTEXTOR

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Personal health information leak prevention in heterogeneous texts

AdaptLRTtoND '09 Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains
A framework for privacy-conducive recommendations

Proceedings of the 9th annual ACM workshop on Privacy in the electronic society
Inference control to protect sensitive information in text documents

ACM SIGKDD Workshop on Intelligence and Security Informatics
Privacy measures for free text documents: bridging the gap between theory and practice

TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business
Sherlock holmes' evil twin: on the impact of global inference for online privacy

Proceedings of the 2011 workshop on New security paradigms workshop
An information theoretic framework for web inference detection

Proceedings of the 5th ACM workshop on Security and artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detecting inferences in documents is critical for ensuring privacy when sharing information. In this paper, we propose a refined and practical model of inference detection using a reference corpus. Our model is inspired by association rule mining: inferences are based on word co-occurrences. Using the model and taking the Web as the reference corpus, we can find inferences and measure their strength through web-mining algorithms that leverage search engines such as Google or Yahoo!. Our model also includes the important case of private corpora, to model inference detection in enterprise settings in which there is a large private document repository. We find inferences in private corpora by using analogues of our Web-mining algorithms, relying on an index for the corpus rather than a Web search engine. We present results from two experiments. The first experiment demonstrates the performance of our techniques in identifying all the keywords that allow for inference of a particular topic (e.g. "HIV") with confidence above a certain threshold. The second experiment uses the public Enron e-mail dataset. We postulate a sensitive topic and use the Enron corpus and the Web together to find inferences for the topic. These experiments demonstrate that our techniques are practical, and that our model of inference based on word co-occurrence is well-suited to efficient inference detection.