Using psycholinguistic features for profiling first language of authors

Authors:
Rosemary Torney;Peter Vamplew;John Yearwood
Affiliations:
Internet Commerce Security Lab, University of Ballarat, Australia;School of Science, Information Technology and Engineering, University of Ballarat, Australia;School of Science, Information Technology and Engineering, University of Ballarat, Australia
Venue:
Journal of the American Society for Information Science and Technology
Year:
2012

Citing 14
Cited 0

Support-Vector Networks

Machine Learning
Network-Based Language Teaching: Concepts and Practice

Network-Based Language Teaching: Concepts and Practice
Mining e-mail content for author identification forensics

ACM SIGMOD Record
Determining an author's native language by mining a text for errors

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Applying Authorship Analysis to Extremist-Group Web Forum Messages

IEEE Intelligent Systems
Linguistic profiling for author recognition and verification

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace

ACM Transactions on Information Systems (TOIS)
Authorship attribution

Foundations and Trends in Information Retrieval
Automatically profiling the author of an anonymous text

Communications of the ACM - Inspiring Women in Computing
Computational methods in authorship attribution

Journal of the American Society for Information Science and Technology
Authorship attribution and verification with many authors and limited data

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using classifier features for studying the effect of native language on the choice of written second language words

CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Detecting Phishing Emails Using Hybrid Features

UIC-ATC '09 Proceedings of the 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing
Improving gender classification of blog authors

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study empirically evaluates the effectiveness of different feature types for the classification of the first language of an author. In particular, it examines the utility of psycholinguistic features, extracted by the Linguistic Inquiry and Word Count (LIWC) tool, that have not previously been applied to the task of author profiling. As LIWC is a tool that has been developed in the psycholinguistic field rather than the computational linguistics field, it was hypothesized that it would be effective, both as a single type feature set because of its psycholinguistic basis, and in combination with other feature sets, because it should be sufficiently different to add insight rather than redundancy. It was found that LIWC features were competitive with previously used feature types in identifying the first language of an author, and that combined feature sets including LIWC features consistently showed better accuracy rates and average F measures than were achieved by the same feature sets without the LIWC features. As a secondary issue, this study also examined how effectively first language classification scaled up to a larger number of possible languages. It was found that the classification scheme scaled up effectively to the entire 16 language collection from the International Corpus of Learner English, when compared with results achieved on just 5 languages in previous research. © 2012 Wiley Periodicals, Inc.