Language independent gender classification on Twitter

Authors:
Jalal S. Alowibdi;Ugo A. Buy;Philip Yu
Affiliations:
University of Illinois at Chicago and King Abdulaziz University;University of Illinois at Chicago;University of Illinois at Chicago and King Abdulaziz University
Venue:
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Year:
2013

Citing 8
Cited 0

The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
KNIME - the Konstanz information miner: version 2.0 and beyond

ACM SIGKDD Explorations Newsletter
Improving gender classification of blog authors

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Classifying latent user attributes in twitter

SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Gender attribution: tracing stylometric evidence beyond topic and genre

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Predicting age and gender in online social networks

Proceedings of the 3rd international workshop on Search and mining user-generated contents
Discriminating gender on Twitter

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Chat mining for gender prediction

ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online Social Networks (OSNs) generate a huge volume of user-originated texts. Gender classification can serve multiple purposes. For example, commercial organizations can use gender classification for advertising. Law enforcement may use gender classification as part of legal investigations. Others may use gender information for social reasons. Here we explore language independent gender classification. Our approach predicts gender using five color-based features extracted from Twitter profiles (e.g., the background color in a user's profile page). Most other methods for gender prediction are typically language dependent. Those methods use high-dimensional spaces consisting of unique words extracted from such text fields as postings, user names, and profile descriptions. Our approach is independent of the user's language, efficient, and scalable, while attaining a good level of accuracy. We prove the validity of our approach by examining different classifiers over a large dataset of Twitter profiles.