The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Quality management on Amazon Mechanical Turk
Proceedings of the ACM SIGKDD Workshop on Human Computation
WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Creating speech and language data with Amazon's Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Improving gender classification of blog authors
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A latent variable model for geographic lexical variation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Classifying latent user attributes in twitter
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Language identification for creating language-specific Twitter collections
LSM '12 Proceedings of the Second Workshop on Language in Social Media
Inferring personal traits from music listening history
Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies
Streaming analysis of discourse participants
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
User demographics and language in an implicit social network
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Personal User or Organizational User? Behavior on Microblog can Tell
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Steeler nation, 12th man, and boo birds: classifying Twitter user interests using time series
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
What is he/she like?: estimating Twitter user attributes from contents and social neighbors
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Language independent gender classification on Twitter
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
User profiling in an ego network: co-profiling attributes and relationships
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Accurate prediction of demographic attributes from social media and other informal online content is valuable for marketing, personalization, and legal investigation. This paper describes the construction of a large, multilingual dataset labeled with gender, and investigates statistical models for determining the gender of uncharacterized Twitter users. We explore several different classifier types on this dataset. We show the degree to which classifier accuracy varies based on tweet volumes as well as when various kinds of profile metadata are included in the models. We also perform a large-scale human assessment using Amazon Mechanical Turk. Our methods significantly out-perform both baseline models and almost all humans on the same task.