Inferring gender of movie reviewers: exploiting writing style, content and metadata

Authors:
Jahna Otterbacher
Affiliations:
Illinois Institute of Technology, Chicago, IL, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 17
Cited 9

Structuring computer-mediated communication systems to avoid information overload

Communications of the ACM
Using collaborative filtering to weave an information tapestry

Communications of the ACM - Special issue on information filtering
Foundations of statistical natural language processing

Foundations of statistical natural language processing
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine Learning

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automatic text categorization in terms of genre and author

Computational Linguistics
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
If you spoke as she does, sir, instead of the way you do: a sociolinguistics perspective of gender differences in virtual communities

ACM SIGMIS Database
Social matching: A framework and research agenda

ACM Transactions on Computer-Human Interaction (TOCHI)
Utility scoring of product reviews

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search

ACM Transactions on Information Systems (TOIS)
Demographic prediction based on user's browsing behavior

Proceedings of the 16th international conference on World Wide Web
How opinions are received by online communities: a case study on amazon.com helpfulness votes

Proceedings of the 18th international conference on World wide web
Is the Crowd's Wisdom Biased? A Quantitative Analysis of Three Online Communities

CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04
Terminology mining in social media

Proceedings of the 18th ACM conference on Information and knowledge management
Imagined communities: awareness, information sharing, and privacy on the facebook

PET'06 Proceedings of the 6th international conference on Privacy Enhancing Technologies

Democrats, republicans and starbucks afficionados: user classification in twitter

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Tracking sentiment in mail: how genders differ on emotional axes

WASSA '11 Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis
Learning the lingo?: gender, prestige and linguistic adaptation in review communities

Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
From once upon a time to happily ever after: Tracking emotions in mail and books

Decision Support Systems
BlurMe: inferring and obfuscating user gender based on ratings

Proceedings of the sixth ACM conference on Recommender systems
User demographics and language in an implicit social network

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A search engine approach to estimating temporal changes in gender orientation of first names

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Inferring the demographics of search users: social data meets search queries

Proceedings of the 22nd international conference on World Wide Web
User demographics prediction based on mobile data

Pervasive and Mobile Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite differences in the way that men and women experience goods and communicate their perspectives, online review communities typically do not provide participants' gender. We propose to infer author gender, given a set of reviews of a particular item, and experiment on reviews posted at the Internet Movie Database (IMDb). Using logistic regression, we explore the contribution of three types of information: 1) style, 2) content, and 3) metadata (e.g. review age, social feedback). Our results concur with previous research, in that there are salient differences in writing style and content between reviews authored by men versus women. However, in comparison to literary or scientific texts, to which classification tasks are often applied, reviews are brief and occur within the context of an ongoing discourse. Therefore, to compensative for the brevity of reviews, content and stylistic features can be augmented with metadata. We find in particular that the perceived utility of a review is an important correlate of gender. The model incorporating all features has a classification accuracy of 73.7% and is not as sensitive to review length as are those based only on stylistic or content features.