A quantitative analysis of lexical differences between genders in telephone conversations

Authors:
Constantinos Boulis;Mari Ostendorf
Affiliations:
University of Washington, Seattle;University of Washington, Seattle
Venue:
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Year:
2005

Citing 2
Cited 4

An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Automatic text categorization in terms of genre and author

Computational Linguistics

Modeling latent biographic attributes in conversational genres

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Classifying latent user attributes in twitter

SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Streaming analysis of discourse participants

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Recognition of understanding level and language skill using measurements of reading behavior

Proceedings of the 19th international conference on Intelligent User Interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we provide an empirical analysis of differences in word use between genders in telephone conversations, which complements the considerable body of work in sociolinguistics concerned with gender linguistic differences. Experiments are performed on a large speech corpus of roughly 12000 conversations. We employ machine learning techniques to automatically categorize the gender of each speaker given only the transcript of his/her speech, achieving 92% accuracy. An analysis of the most characteristic words for each gender is also presented. Experiments reveal that the gender of one conversation side influences lexical use of the other side. A surprising result is that we were able to classify male-only vs. female-only conversations with almost perfect accuracy.