Word association norms, mutual information, and lexicography
Computational Linguistics
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
On anonymizing query logs via token-based hashing
Proceedings of the 16th international conference on World Wide Web
Discovering key concepts in verbose queries
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Releasing search queries and clicks privately
Proceedings of the 18th international conference on World wide web
Effective anonymization of query logs
Proceedings of the 18th ACM conference on Information and knowledge management
Anonymization of set-valued data via top-down, local generalization
Proceedings of the VLDB Endowment
CrowdLogging: distributed, private, and anonymous search logging
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A Survey of Automatic Query Expansion in Information Retrieval
ACM Computing Surveys (CSUR)
Publishing Search Logs—A Comparative Study of Privacy Guarantees
IEEE Transactions on Knowledge and Data Engineering
Mining query subtopics from search log data
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
ECIR 2013: 35th european conference on information retrieval
ACM SIGIR Forum
Hi-index | 0.00 |
Search log k-anonymization is based on the elimination of infrequent queries under exact (or nearly exact) matching conditions, which usually results in a big data loss and impaired utility. We present a more flexible, semantic approach to k-anonymity that consists of three steps: query concept mining, automatic query expansion, and affinity assessment of expanded queries. Based on the observation that many infrequent queries can be seen as refinements of a more general frequent query, we first model query concepts as probabilistically weighted n-grams and extract them from the search log data. Then, after expanding the original log queries with their weighted concepts, we find all the k-affine expanded queries under a given affinity threshold Θ, modeled as a generalized k-core of the graph of Θ-affine queries. Experimenting with the AOL data set, we show that this approach achieves levels of privacy comparable to those of plain k-anonymity while at the same time reducing the data losses to a great extent.