Discriminating gender on Twitter

  • Authors:
  • John D. Burger;John Henderson;George Kim;Guido Zarrella

  • Affiliations:
  • The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts

  • Venue:
  • EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.