Discriminating gender on Twitter

Authors:
John D. Burger;John Henderson;George Kim;Guido Zarrella
Affiliations:
The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts;The MITRE Corporation, Bedford, Massachusetts
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 9
Cited 9

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Quality management on Amazon Mechanical Turk

Proceedings of the ACM SIGKDD Workshop on Human Computation
The Edinburgh Twitter corpus

WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Improving gender classification of blog authors

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A latent variable model for geographic lexical variation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Classifying latent user attributes in twitter

SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Language identification for creating language-specific Twitter collections

LSM '12 Proceedings of the Second Workshop on Language in Social Media
Inferring personal traits from music listening history

Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies
Streaming analysis of discourse participants

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
User demographics and language in an implicit social network

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Personal User or Organizational User? Behavior on Microblog can Tell

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Steeler nation, 12th man, and boo birds: classifying Twitter user interests using time series

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
What is he/she like?: estimating Twitter user attributes from contents and social neighbors

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Language independent gender classification on Twitter

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
User profiling in an ego network: co-profiling attributes and relationships

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.