Vanity fair: privacy in querylog bundles

  • Authors:
  • Rosie Jones;Ravi Kumar;Bo Pang;Andrew Tomkins

  • Affiliations:
  • Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;Yahoo! Research, Sunnyvale, CA, USA

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A recently proposed approach to address privacy concerns in storing web search querylogs is bundling logs of multiple users together. In this work we investigate privacy leaks that are possible even when querylogs from multiple users are bundled together, without any user or session identifiers. We begin by quantifying users' propensity to issue own-name vanity queries and geographically revealing queries. We show that these propensities interact badly with two forms of vulnerabilities in the bundling scheme. First, structural vulnerabilities arise due to properties of the heavy tail of the user search frequency distribution, or the distribution of locations that appear within a user's queries. These heavy tails may cause a user to appear visibly different from other users in the same bundle. Second, we demonstrate analytical vulnerabilities based on the ability to separate the queries in a bundle into threads corresponding to individual users. These vulnerabilities raise privacy issues suggesting that bundling must be handled with great care.