The media equation: how people treat computers, television, and new media like real people and places
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Text analysis as a tool for analyzing conversation in online support groups
CHI '04 Extended Abstracts on Human Factors in Computing Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Whose thumb is it anyway?: classifying author personality from weblog text
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Using linguistic cues for the automatic recognition of personality in conversation and text
Journal of Artificial Intelligence Research
The role of emotional stability in Twitter conversations
Proceedings of the Workshop on Semantic Analysis in Social Media
Hi YouTube!: personality impressions and verbal content in social video
Proceedings of the 15th ACM on International conference on multimodal interaction
Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
Personality is a fundamental component of an individual's affective behavior. Previous work on personality classification has emerged from disparate sources: Varieties of algorithms and feature-selection across spoken and written data have made comparison difficult. Here, we use a large corpus of blogs to compare classification feature selection; we also use these results to identify characteristic language information relating to personality. Using Support Vector Machines, the best accuracies range from 84.36% (openness to experience) to 70.51% (neuroticism). To achieve these results, the best performing features were a combination of: (1) stemmed bigrams; (2) no exclusion of stopwords (i.e. common words); and (3) the boolean, presence or absence of features noted, rather than their rate of use. We take these findings to suggest that both the structure of the text and the presence of common words are important. We also note that a common dictionary of words used for content analysis (LIWC) performs less well in this classification task, which we propose is due to their conceptual breadth. To get a better sense of how personality is expressed in the blogs, we explore the best performing features and discuss how these can provide a deeper understanding of personality language behavior online.