Privacy-preserving query log mining for business confidentiality protection

  • Authors:
  • Barbara Poblete;Myra Spiliopoulou;Ricardo Baeza-Yates

  • Affiliations:
  • Yahoo! Research Chile, Santiago, Chile;Otto von Guericke University, Magdeburg, Germany;Yahoo! Research Spain, Barcelona, Spain

  • Venue:
  • ACM Transactions on the Web (TWEB)
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce the concern of confidentiality protection of business information for the publication of search engine query logs and derived data. We study business confidentiality, as the protection of nonpublic data from institutions, such as companies and people in the public eye. In particular, we relate this concern to the involuntary exposure of confidential Web site information, and we transfer this problem into the field of privacy-preserving data mining. We characterize the possible adversaries interested in disclosing Web site confidential data and the attack strategies that they could use. These attacks are based on different vulnerabilities found in query log for which we present several anonymization heuristics to prevent them. We perform an experimental evaluation to estimate the remaining utility of the log after the application of our anonymization techniques. Our experimental results show that a query log can be anonymized against these specific attacks while retaining a significant volume of useful data.