Content-based similarity measures of weblog authors

Authors:
Christopher Wienberg;Melissa Roemmele;Andrew S. Gordon
Affiliations:
University of Southern California, Los Angeles, CA;University of Southern California, Los Angeles, CA;University of Southern California, Los Angeles, CA
Venue:
Proceedings of the 5th Annual ACM Web Science Conference
Year:
2013

Citing 3
Cited 1

Whose thumb is it anyway?: classifying author personality from weblog text

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Using linguistic cues for the automatic recognition of personality in conversation and text

Journal of Artificial Intelligence Research
Age prediction in blogs: a study of style, content, and online behavior in pre- and post-social media generations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Geographical and organizational distances in enterprise crowdfunding

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With recent research interest in the confounding roles of homophily and contagion in studies of social influence, there is a strong need for reliable content-based measures of the similarity between people. In this paper, we investigate the use of text similarity measures as a way of predicting the similarity of prolific weblog authors. We describe a novel method of collecting human judgments of overall similarity between two authors, as well as demographic, political, cultural, religious, values, hobbies/interests, personality, and writing style similarity. We then apply a range of automated textual similarity measures based on word frequency counts, and calculate their statistical correlation with human judgments. Our findings indicate that commonly used text similarity measures do not correlate well with human judgments of author similarity. However, various measures that pay special attention to personal pronouns and their context correlate significantly with different facets of similarity.