Predicting age and gender in online social networks

  • Authors:
  • Claudia Peersman;Walter Daelemans;Leona Van Vaerenbergh

  • Affiliations:
  • Antwerp University & artesis, Antwerp, Belgium;Antwerp University, Antwerp, Belgium;artesis, Antwerp, Belgium

  • Venue:
  • Proceedings of the 3rd international workshop on Search and mining user-generated contents
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A common characteristic of communication on online social networks is that it happens via short messages, often using non-standard language variations. These characteristics make this type of text a challenging text genre for natural language processing. Moreover, in these digital communities it is easy to provide a false name, age, gender and location in order to hide one's true identity, providing criminals such as pedophiles with new possibilities to groom their victims. It would therefore be useful if user profiles can be checked on the basis of text analysis, and false profiles flagged for monitoring. This paper presents an exploratory study in which we apply a text categorization approach for the prediction of age and gender on a corpus of chat texts, which we collected from the Belgian social networking site Netlog. We examine which types of features are most informative for a reliable prediction of age and gender on this difficult text type and perform experiments with different data set sizes in order to acquire more insight into the minimum data size requirements for this task.