An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Automatic text categorization in terms of genre and author
Computational Linguistics
Modeling latent biographic attributes in conversational genres
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Classifying latent user attributes in twitter
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Streaming analysis of discourse participants
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Recognition of understanding level and language skill using measurements of reading behavior
Proceedings of the 19th international conference on Intelligent User Interfaces
Hi-index | 0.00 |
In this work, we provide an empirical analysis of differences in word use between genders in telephone conversations, which complements the considerable body of work in sociolinguistics concerned with gender linguistic differences. Experiments are performed on a large speech corpus of roughly 12000 conversations. We employ machine learning techniques to automatically categorize the gender of each speaker given only the transcript of his/her speech, achieving 92% accuracy. An analysis of the most characteristic words for each gender is also presented. Experiments reveal that the gender of one conversation side influences lexical use of the other side. A surprising result is that we were able to classify male-only vs. female-only conversations with almost perfect accuracy.