Computational methods in authorship attribution

Authors:
Moshe Koppel;Jonathan Schler;Shlomo Argamon
Affiliations:
Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Illinois Institute of Technology, 10 W. 31st Street, Chicago, IL 60616
Venue:
Journal of the American Society for Information Science and Technology
Year:
2009

Citing 0
Cited 28

Part of Speech (POS) Tag Sets Reduction and Analysis Using Rough Set Techniques

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Automatically classifying documents by ideological and organizational affiliation

ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
e-mail authorship verification for forensic investigation

Proceedings of the 2010 ACM Symposium on Applied Computing
Text-based video content classification for online video-sharing sites

Journal of the American Society for Information Science and Technology
Which clustering do you want? inducing your ideal clustering with minimal feedback

Journal of Artificial Intelligence Research
Intrinsic plagiarism analysis

Language Resources and Evaluation
Plagiarism and authorship analysis: introduction to the special issue

Language Resources and Evaluation
Authorship attribution in the wild

Language Resources and Evaluation
Local histograms of character N-grams for authorship attribution

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised decomposition of a document into authorial components

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Lost in translation: authorship attribution using frame semantics

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
‘twazn me!!! ;(’ automatic authorship analysis of micro-blogging messages

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Online conversation mining for author characterization and topic identification

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Detection of near-duplicate user generated contents: the SMS spam collection

Proceedings of the 3rd international workshop on Search and mining user-generated contents
A weighted profile intersection measure for profile-based authorship attribution

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Authorship Attribution Based on Specific Vocabulary

ACM Transactions on Information Systems (TOIS)
Using psycholinguistic features for profiling first language of authors

Journal of the American Society for Information Science and Technology
Mining writeprints from anonymous e-mails for forensic investigation

Digital Investigation: The International Journal of Digital Forensics & Incident Response
A new document author representation for authorship attribution

MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
Detecting stylistic deception

EACL 2012 Proceedings of the Workshop on Computational Approaches to Deception Detection
Authorship attribution with author-aware topic models

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Exploring adaptor grammars for native language identification

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
On the role of poetic versus nonpoetic features in “kindred” and diachronic poetry attribution

Journal of the American Society for Information Science and Technology
Authorship attribution based on a probabilistic topic model

Information Processing and Management: an International Journal
Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style

Expert Systems with Applications: An International Journal
Explanation in computational stylometry

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Simplified features for email authorship identification

International Journal of Security and Networks
Towards a taxonomy of suspected forgery in authorship attribution field: a case: Montale's Diario postumo

Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment: metadata, vocabularies and techniques in the Digital Humanities

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical authorship attribution has a long history, culminating in the use of modern machine learning classification methods. Nevertheless, most of this work suffers from the limitation of assuming a small closed set of candidate authors and essentially unlimited training text for each. Real-life authorship attribution problems, however, typically fall short of this ideal. Thus, following detailed discussion of previous work, three scenarios are considered here for which solutions to the basic attribution problem are inadequate. In the first variant, the profiling problem, there is no candidate set at all; in this case, the challenge is to provide as much demographic or psychological information as possible about the author. In the second variant, the needle-in-a-haystack problem, there are many thousands of candidates for each of whom we might have a very limited writing sample. In the third variant, the verification problem, there is no closed candidate set but there is one suspect; in this case, the challenge is to determine if the suspect is or is not the author. For each variant, it is shown how machine learning methods can be adapted to handle the special challenges of that variant. © 2009 Wiley Periodicals, Inc.