Foundations of statistical natural language processing
Foundations of statistical natural language processing
Proceedings of the 6th international conference on Intelligent user interfaces
Natural language processing for information assurance and security: an overview and implementations
Proceedings of the 2000 workshop on New security paradigms
Spoken English
Mining e-mail content for author identification forensics
ACM SIGMOD Record
Data Level Inference Detection in Database Systems
CSFW '98 Proceedings of the 11th IEEE workshop on Computer Security Foundations
Detection and Elimination of Inference Channels in Multilevel Relational Database Systems
SP '93 Proceedings of the 1993 IEEE Symposium on Security and Privacy
Catalytic Inference Analysis: Detecting Inference Threats due to Knowledge Discovery
SP '97 Proceedings of the 1997 IEEE Symposium on Security and Privacy
The myth of the double-blind review?: author identification using only citations
ACM SIGKDD Explorations Newsletter
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Web-assisted annotation, semantic indexing and search of television and radio news
WWW '05 Proceedings of the 14th international conference on World Wide Web
Proceedings of the 15th international conference on World Wide Web
Authorship attribution with thousands of candidate authors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Revisiting the uniqueness of simple demographics in the US population
Proceedings of the 5th ACM workshop on Privacy in electronic society
Using the web as an implicit training set: application to structural ambiguity resolution
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Coherent keyphrase extraction via web mining
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A content-driven access control system
Proceedings of the 7th symposium on Identity and trust on the Internet
Detecting privacy leaks using corpus-based association rules
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Revisiting Lexical Signatures to (Re-)Discover Web Pages
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Proceedings of the 4th ACM international workshop on Storage security and survivability
A comparison of techniques for estimating IDF values to generate lexical signatures for the web
Proceedings of the 10th ACM workshop on Web information and data management
Correlation of Term Count and Document Frequency for Google N-Grams
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Privacy measures for free text documents: bridging the gap between theory and practice
TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business
Sherlock holmes' evil twin: on the impact of global inference for online privacy
Proceedings of the 2011 workshop on New security paradigms workshop
An information theoretic framework for web inference detection
Proceedings of the 5th ACM workshop on Security and artificial intelligence
International Journal of Agent Technologies and Systems
de-linkability: a privacy-preserving constraint for safely outsourcing multimedia documents
Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Hi-index | 0.00 |
Newly published data, when combined with existing public knowledge, allows for complex and sometimes unintended inferences. We propose semi-automated tools for detecting these inferences prior to releasing data. Our tools give data owners a fuller understanding of the implications of releasing data and help them adjust the amount of data they release to avoid unwanted inferences. Our tools first extract salient keywords from the private data intended for release. Then, they issue search queries for documents that match subsets of these keywords, within a reference corpus (such as the public Web) that encapsulates as much of relevant public knowledge as possible. Finally, our tools parse the documents returned by the search queries for keywords not present in the original private data. These additional keywords allow us to automatically estimate the likelihood of certain inferences. Potentially dangerous inferences are flagged for manual review. We call this new technology Web-based inference control. The paper reports on two experiments which demonstrate early successes of this technology. The first experiment shows the use of our tools to automatically estimate the risk that an anonymous document allows for re-identification of its author. The second experiment shows the use of our tools to detect the risk that a document is linked to a sensitive topic. These experiments, while simple, capture the full complexity of inference detection and illustrate the power of our approach.