Fast discovery of association rules
Advances in knowledge discovery and data mining
Generating association rules from semi-structured documents using an extended concept hierarchy
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Natural language processing and knowledge representation: language for knowledge and knowledge for language
Proceedings of the 6th international conference on Intelligent user interfaces
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The inference problem: a survey
ACM SIGKDD Explorations Newsletter
Pushing Support Constraints Into Association Rules Mining
IEEE Transactions on Knowledge and Data Engineering
ACM SIGKDD Explorations Newsletter
Web-assisted annotation, semantic indexing and search of television and radio news
WWW '05 Proceedings of the 14th international conference on World Wide Web
Mining generalized association rules on biomedical literature
IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Using the web as an implicit training set: application to structural ambiguity resolution
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Detecting reviewer bias through web-based association mining
Proceedings of the 2nd ACM workshop on Information credibility on the web
Sanitization's slippery slope: the design and study of a text revision assistant
Proceedings of the 5th Symposium on Usable Privacy and Security
Protecting Sensitive Topics in Text Documents with PROTEXTOR
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Personal health information leak prevention in heterogeneous texts
AdaptLRTtoND '09 Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains
A framework for privacy-conducive recommendations
Proceedings of the 9th annual ACM workshop on Privacy in the electronic society
Inference control to protect sensitive information in text documents
ACM SIGKDD Workshop on Intelligence and Security Informatics
Privacy measures for free text documents: bridging the gap between theory and practice
TrustBus'11 Proceedings of the 8th international conference on Trust, privacy and security in digital business
Sherlock holmes' evil twin: on the impact of global inference for online privacy
Proceedings of the 2011 workshop on New security paradigms workshop
An information theoretic framework for web inference detection
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Hi-index | 0.00 |
Detecting inferences in documents is critical for ensuring privacy when sharing information. In this paper, we propose a refined and practical model of inference detection using a reference corpus. Our model is inspired by association rule mining: inferences are based on word co-occurrences. Using the model and taking the Web as the reference corpus, we can find inferences and measure their strength through web-mining algorithms that leverage search engines such as Google or Yahoo!. Our model also includes the important case of private corpora, to model inference detection in enterprise settings in which there is a large private document repository. We find inferences in private corpora by using analogues of our Web-mining algorithms, relying on an index for the corpus rather than a Web search engine. We present results from two experiments. The first experiment demonstrates the performance of our techniques in identifying all the keywords that allow for inference of a particular topic (e.g. "HIV") with confidence above a certain threshold. The second experiment uses the public Enron e-mail dataset. We postulate a sensitive topic and use the Enron corpus and the Web together to find inferences for the topic. These experiments demonstrate that our techniques are practical, and that our model of inference based on word co-occurrence is well-suited to efficient inference detection.