Web-based inference detection

Authors:
Jessica Staddon;Philippe Golle;Bryce Zimny
Affiliations:
Palo Alto Research Center;Palo Alto Research Center;University of Waterloo
Venue:
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Year:
2007

Citing 17
Cited 12

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Community search assistant

Proceedings of the 6th international conference on Intelligent user interfaces
Natural language processing for information assurance and security: an overview and implementations

Proceedings of the 2000 workshop on New security paradigms
Spoken English

Spoken English
Mining e-mail content for author identification forensics

ACM SIGMOD Record
Data Level Inference Detection in Database Systems

CSFW '98 Proceedings of the 11th IEEE workshop on Computer Security Foundations
Detection and Elimination of Inference Channels in Multilevel Relational Database Systems

SP '93 Proceedings of the 1993 IEEE Symposium on Security and Privacy
Catalytic Inference Analysis: Detecting Inference Threats due to Knowledge Discovery

SP '97 Proceedings of the 1997 IEEE Symposium on Security and Privacy
The myth of the double-blind review?: author identification using only citations

ACM SIGKDD Explorations Newsletter
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Authorship verification as a one-class classification problem

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Web-assisted annotation, semantic indexing and search of television and radio news

WWW '05 Proceedings of the 14th international conference on World Wide Web
Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection

Proceedings of the 15th international conference on World Wide Web
Authorship attribution with thousands of candidate authors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Revisiting the uniqueness of simple demographics in the US population

Proceedings of the 5th ACM workshop on Privacy in electronic society
Using the web as an implicit training set: application to structural ambiguity resolution

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Coherent keyphrase extraction via web mining

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Love and authentication

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A content-driven access control system

Proceedings of the 7th symposium on Identity and trust on the Internet
Detecting privacy leaks using corpus-based association rules

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Revisiting Lexical Signatures to (Re-)Discover Web Pages

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Testable commitments

Proceedings of the 4th ACM international workshop on Storage security and survivability
A comparison of techniques for estimating IDF values to generate lexical signatures for the web

Proceedings of the 10th ACM workshop on Web information and data management
Correlation of Term Count and Document Frequency for Google N-Grams

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Privacy measures for free text documents: bridging the gap between theory and practice

TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business
Sherlock holmes' evil twin: on the impact of global inference for online privacy

Proceedings of the 2011 workshop on New security paradigms workshop
An information theoretic framework for web inference detection

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Modeling Virtual Footprints

International Journal of Agent Technologies and Systems
de-linkability: a privacy-preserving constraint for safely outsourcing multimedia documents

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Newly published data, when combined with existing public knowledge, allows for complex and sometimes unintended inferences. We propose semi-automated tools for detecting these inferences prior to releasing data. Our tools give data owners a fuller understanding of the implications of releasing data and help them adjust the amount of data they release to avoid unwanted inferences. Our tools first extract salient keywords from the private data intended for release. Then, they issue search queries for documents that match subsets of these keywords, within a reference corpus (such as the public Web) that encapsulates as much of relevant public knowledge as possible. Finally, our tools parse the documents returned by the search queries for keywords not present in the original private data. These additional keywords allow us to automatically estimate the likelihood of certain inferences. Potentially dangerous inferences are flagged for manual review. We call this new technology Web-based inference control. The paper reports on two experiments which demonstrate early successes of this technology. The first experiment shows the use of our tools to automatically estimate the risk that an anonymous document allows for re-identification of its author. The second experiment shows the use of our tools to detect the risk that a document is linked to a sensitive topic. These experiments, while simple, capture the full complexity of inference detection and illustrate the power of our approach.