Generalizing data to provide anonymity when disclosing information (abstract)
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Categorizing web queries according to geographical locality
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Proceedings of the 13th international conference on World Wide Web
You're not from 'round here, are you?: naive Bayes detection of non-native utterance text
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
You are what you say: privacy risks of public mentions
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Demographic prediction based on user's browsing behavior
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 16th international conference on World Wide Web
On anonymizing query logs via token-based hashing
Proceedings of the 16th international conference on World Wide Web
A survey of query log privacy-enhancing techniques from a policy perspective
ACM Transactions on the Web (TWEB)
Vanity fair: privacy in querylog bundles
Proceedings of the 17th ACM conference on Information and knowledge management
Discovering and using groups to improve personalized search
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Releasing search queries and clicks privately
Proceedings of the 18th international conference on World wide web
Privacy-Preserving Data Publishing
Foundations and Trends in Databases
Anonymization of set-valued data via top-down, local generalization
Proceedings of the VLDB Endowment
FM '09 Proceedings of the 2nd World Congress on Formal Methods
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Privacy-preserving query log mining for business confidentiality protection
ACM Transactions on the Web (TWEB)
The demographics of web search
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
On the privacy of web search based on query obfuscation: a case study of TrackMeNot
PETS'10 Proceedings of the 10th international conference on Privacy enhancing technologies
Inference control to protect sensitive information in text documents
ACM SIGKDD Workshop on Intelligence and Security Informatics
Transactions on Data Privacy
Democrats, republicans and starbucks afficionados: user classification in twitter
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Differentially Private Empirical Risk Minimization
The Journal of Machine Learning Research
Adjusting the trade-off between privacy guarantees and computational cost in secure hardware PIR
SDM'11 Proceedings of the 8th VLDB international conference on Secure data management
What and how children search on the web
Proceedings of the 20th ACM international conference on Information and knowledge management
User k-anonymity for privacy preserving data mining of query logs
Information Processing and Management: an International Journal
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
RepliCHI SIG: from a panel to a new submission venue for replication
CHI '12 Extended Abstracts on Human Factors in Computing Systems
Aggregate suppression for enterprise search engines
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Differentially private search log sanitization with optimal output utility
Proceedings of the 15th International Conference on Extending Database Technology
Information Sciences: an International Journal
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
From republicans to teenagers --- group membership and search (GRUMPS)
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Inferring the demographics of search users: social data meets search queries
Proceedings of the 22nd international conference on World Wide Web
Questions about questions: an empirical analysis of information needs on Twitter
Proceedings of the 22nd international conference on World Wide Web
k-subscription: privacy-preserving microblogging browsing through obfuscation
Proceedings of the 29th Annual Computer Security Applications Conference
Efficient Time-Stamped Event Sequence Anonymization
ACM Transactions on the Web (TWEB)
Analysis of Search and Browsing Behavior of Young Users on the Web
ACM Transactions on the Web (TWEB)
User profiling in an ego network: co-profiling attributes and relationships
Proceedings of the 23rd international conference on World wide web
Web search query privacy: Evaluating query obfuscation and anonymizing networks
Journal of Computer Security
Hi-index | 0.00 |
We investigate the subtle cues to user identity that may be exploited in attacks on the privacy of users in web search query logs. We study the application of simple classifiers to map a sequence of queries into the gender, age, and location of the user issuing the queries. We then show how these classifiers may be carefully combined at multiple granularities to map a sequence of queries into a set of candidate users that is 300-600 times smaller than random chance would allow. We show that this approach remains accurate even after removing personally identifiable information such as names/numbers or limiting the size of the query log. We also present a new attack in which a real-world acquaintance of a user attempts to identify that user in a large query log, using personal information. We show that combinations of small pieces of information about terms a user would probably search for can be highly effective in identifying the sessions of that user. We conclude that known schemes to release even heavily scrubbed query logs that contain session information have significant privacy risks.