Semantic microaggregation for the anonymization of query logs

  • Authors:
  • Arnau Erola;Jordi Castellà-Roca;Guillermo Navarro-Arribas;Vicenç Torra

  • Affiliations:
  • Departament d'Enginyeria Informàtica i Matemàtiques, UNESCO, Tarragona, Spain;Departament d'Enginyeria Informàtica i Matemàtiques, UNESCO, Tarragona, Spain;Institut d'Investigació en Intel-ligència Artificial, Consejo Superior de Investigaciones Científicas, Catalonia, Spain;Institut d'Investigació en Intel-ligència Artificial, Consejo Superior de Investigaciones Científicas, Catalonia, Spain

  • Venue:
  • PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The publication of Web search logs is very useful for the scientific research community, but to preserve the users' privacy, logs have to be submitted to an anonymization process. Random query swapping is a common technique used to protect logs that provides k-anonymity to the users in exchange for loss of utility. With the assumption that by swapping queries semantically close this utility loss can be reduced, we introduce a novel protection method that semantically microaggregates the logs using the Open Directory Project. That is, we extend a common method used in statistical disclosure control to protect search logs from a semantic perspective. The method has been tested with a random subset of AOL search logs, and it has been observed that new logs improve the data usefulness.