On the effectiveness of anonymizing networks for web search privacy

  • Authors:
  • Sai Teja Peddinti;Nitesh Saxena

  • Affiliations:
  • Polytechnic Institute of New York University;Polytechnic Institute of New York University

  • Venue:
  • Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web search has emerged as one of the most important applications on the internet, with several search engines available to the users. There is a common practice among these search engines to log and analyse the user queries, which leads to serious privacy implications. One well known solution to search privacy involves issuing the queries via an anonymizing network, such as Tor, thereby hiding one's identity from the search engine. A fundamental problem with this solution, however, is that user queries are still obviously revealed to the search engine, although they are "mixed" among the queries issued by other users of the same anonymization service. In this paper, we consider the problem of identifying the queries of a user of interest (UOI) within a pool of queries received by a search engine over an anonymizing network. We demonstrate that an adversarial search engine can extract the UOI's queries, when it is equipped with only a short-term user search query history, by utilizing only the query content information and off-the-shelf machine learning classifiers. More specifically, by treating a selected set of 60 users --- from the publicly-available AOL search logs --- as the users of interest performing web search over an anonymizing network, we show that each user's queries can be identified with 25.95% average accuracy, when mixed with queries of 99 other users of the anonymization service. This average accuracy drops to 18.95% when queries of 999 other users of the anonymization service are mixed together. Though the average accuracies are not so high, our results indicate that few users of interest could be identified with accuracies as high as 80--98%, even when their queries are mixed among queries of 999 other users. Our results cast serious doubts on the effectiveness of anonymizing web search queries by means of anonymizing networks.