Machine Learning
Network-Based Language Teaching: Concepts and Practice
Network-Based Language Teaching: Concepts and Practice
Mining e-mail content for author identification forensics
ACM SIGMOD Record
Determining an author's native language by mining a text for errors
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Applying Authorship Analysis to Extremist-Group Web Forum Messages
IEEE Intelligent Systems
Linguistic profiling for author recognition and verification
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ACM Transactions on Information Systems (TOIS)
Foundations and Trends in Information Retrieval
Automatically profiling the author of an anonymous text
Communications of the ACM - Inspiring Women in Computing
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
Authorship attribution and verification with many authors and limited data
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Detecting Phishing Emails Using Hybrid Features
UIC-ATC '09 Proceedings of the 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing
Improving gender classification of blog authors
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
This study empirically evaluates the effectiveness of different feature types for the classification of the first language of an author. In particular, it examines the utility of psycholinguistic features, extracted by the Linguistic Inquiry and Word Count (LIWC) tool, that have not previously been applied to the task of author profiling. As LIWC is a tool that has been developed in the psycholinguistic field rather than the computational linguistics field, it was hypothesized that it would be effective, both as a single type feature set because of its psycholinguistic basis, and in combination with other feature sets, because it should be sufficiently different to add insight rather than redundancy. It was found that LIWC features were competitive with previously used feature types in identifying the first language of an author, and that combined feature sets including LIWC features consistently showed better accuracy rates and average F measures than were achieved by the same feature sets without the LIWC features. As a secondary issue, this study also examined how effectively first language classification scaled up to a larger number of possible languages. It was found that the classification scheme scaled up effectively to the entire 16 language collection from the International Corpus of Learner English, when compared with results achieved on just 5 languages in previous research. © 2012 Wiley Periodicals, Inc.