Semantic search log k-anonymization with generalized k-cores of query concept graph

Authors:
Claudio Carpineto;Giovanni Romano
Affiliations:
Fondazione Ugo Bordoni, Rome, Italy;Fondazione Ugo Bordoni, Rome, Italy
Venue:
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Year:
2013

Citing 11
Cited 1

Word association norms, mutual information, and lexicography

Computational Linguistics
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
On anonymizing query logs via token-based hashing

Proceedings of the 16th international conference on World Wide Web
Discovering key concepts in verbose queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Releasing search queries and clicks privately

Proceedings of the 18th international conference on World wide web
Effective anonymization of query logs

Proceedings of the 18th ACM conference on Information and knowledge management
Anonymization of set-valued data via top-down, local generalization

Proceedings of the VLDB Endowment
CrowdLogging: distributed, private, and anonymous search logging

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A Survey of Automatic Query Expansion in Information Retrieval

ACM Computing Surveys (CSUR)
Publishing Search Logs—A Comparative Study of Privacy Guarantees

IEEE Transactions on Knowledge and Data Engineering
Mining query subtopics from search log data

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

ECIR 2013: 35th european conference on information retrieval

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search log k-anonymization is based on the elimination of infrequent queries under exact (or nearly exact) matching conditions, which usually results in a big data loss and impaired utility. We present a more flexible, semantic approach to k-anonymity that consists of three steps: query concept mining, automatic query expansion, and affinity assessment of expanded queries. Based on the observation that many infrequent queries can be seen as refinements of a more general frequent query, we first model query concepts as probabilistically weighted n-grams and extract them from the search log data. Then, after expanding the original log queries with their weighted concepts, we find all the k-affine expanded queries under a given affinity threshold Θ, modeled as a generalized k-core of the graph of Θ-affine queries. Experimenting with the AOL data set, we show that this approach achieves levels of privacy comparable to those of plain k-anonymity while at the same time reducing the data losses to a great extent.