C4.5: programs for machine learning
C4.5: programs for machine learning
Generalizing data to provide anonymity when disclosing information (abstract)
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Proceedings of the 13th international conference on World Wide Web
Generating query substitutions
Proceedings of the 15th international conference on World Wide Web
You are what you say: privacy risks of public mentions
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Defining a session on Web search engines: Research Articles
Journal of the American Society for Information Science and Technology
Proceedings of the 16th international conference on World Wide Web
On anonymizing query logs via token-based hashing
Proceedings of the 16th international conference on World Wide Web
Information re-retrieval: repeat queries in Yahoo's logs
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
"I know what you did last summer": query logs and user privacy
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Releasing search queries and clicks privately
Proceedings of the 18th international conference on World wide web
Anonymizing user profiles for personalized web search
Proceedings of the 19th international conference on World wide web
Personal health information leak prevention in heterogeneous texts
AdaptLRTtoND '09 Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains
Private and continual release of statistics
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
On the privacy of web search based on query obfuscation: a case study of TrackMeNot
PETS'10 Proceedings of the 10th international conference on Privacy enhancing technologies
Inference control to protect sensitive information in text documents
ACM SIGKDD Workshop on Intelligence and Security Informatics
Enhancing deniability against query-logs
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Transactions on Data Privacy
Private and Continual Release of Statistics
ACM Transactions on Information and System Security (TISSEC)
Information Sciences: an International Journal
A query scrambler for search privacy on the internet
Information Retrieval
Web search query privacy: Evaluating query obfuscation and anonymizing networks
Journal of Computer Security
Hi-index | 0.00 |
A recently proposed approach to address privacy concerns in storing web search querylogs is bundling logs of multiple users together. In this work we investigate privacy leaks that are possible even when querylogs from multiple users are bundled together, without any user or session identifiers. We begin by quantifying users' propensity to issue own-name vanity queries and geographically revealing queries. We show that these propensities interact badly with two forms of vulnerabilities in the bundling scheme. First, structural vulnerabilities arise due to properties of the heavy tail of the user search frequency distribution, or the distribution of locations that appear within a user's queries. These heavy tails may cause a user to appear visibly different from other users in the same bundle. Second, we demonstrate analytical vulnerabilities based on the ability to separate the queries in a bundle into threads corresponding to individual users. These vulnerabilities raise privacy issues suggesting that bundling must be handled with great care.