Preventing automatic user profiling in Web 2.0 applications

  • Authors:
  • Alexandre Viejo;David SáNchez;Jordi Castellí-Roca

  • Affiliations:
  • Departament d'Enginyeria Informítica i Matemítiques, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Av. Països Catalans 26, E-43007 Tarragona, Spain;Departament d'Enginyeria Informítica i Matemítiques, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Av. Països Catalans 26, E-43007 Tarragona, Spain;Departament d'Enginyeria Informítica i Matemítiques, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Av. Països Catalans 26, E-43007 Tarragona, Spain

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rise of the Internet and Web 2.0 platforms have brought very accessible publishing techniques that have transformed users' role from mere content consumers to fully content consumers-producers. Previous works have shown that user-generated content can be automatically analyzed to extract useful information for the society. Nevertheless, researchers have also shown that it is possible to build individual user profiles automatically. This situation may provoke concerns to the users worried about their privacy. In this paper, we present a new scheme that effectively obfuscates the real user's profile in front of automatic profiling systems, while maintaining her publications intact in order to interfere the least with her readers. The proposed system generates and publishes fake messages with terms semantically correlated with user posts to distort and, hence, hide the real profile. Our method has been tested using Twitter, a very well-known Web 2.0 microblogging platform. Evaluation results show that this new scheme effectively distorts user profiles, producing uniform (i.e. balanced) profiles that hardly characterize users and outperforming simpler methods based on random distortions. In addition to that, the presented system is adaptive, capable of profiling and anonymizing users with a quite limited number of publications and it reacts quickly to any variation in their interests.