"I know what you did last summer": query logs and user privacy

  • Authors:
  • Rosie Jones;Ravi Kumar;Bo Pang;Andrew Tomkins

  • Affiliations:
  • Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the subtle cues to user identity that may be exploited in attacks on the privacy of users in web search query logs. We study the application of simple classifiers to map a sequence of queries into the gender, age, and location of the user issuing the queries. We then show how these classifiers may be carefully combined at multiple granularities to map a sequence of queries into a set of candidate users that is 300-600 times smaller than random chance would allow. We show that this approach remains accurate even after removing personally identifiable information such as names/numbers or limiting the size of the query log. We also present a new attack in which a real-world acquaintance of a user attempts to identify that user in a large query log, using personal information. We show that combinations of small pieces of information about terms a user would probably search for can be highly effective in identifying the sessions of that user. We conclude that known schemes to release even heavily scrubbed query logs that contain session information have significant privacy risks.