Authorship attribution with thousands of candidate authors

Authors:
Moshe Koppel;Jonathan Schler;Shlomo Argamon;Eran Messeri
Affiliations:
Bar Ilan University, Ramat Gan, Israel;Bar Ilan University, Ramat Gan, Israel;Illinois Institute of Technology, Chicago, IL;Bar Ilan University, Ramat Gan, Israel
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 4
Cited 10

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Authorship Attribution with Support Vector Machines

Applied Intelligence
Augmenting Naive Bayes Classifiers with Statistical Language Models

Information Retrieval
The myth of the double-blind review?: author identification using only citations

ACM SIGKDD Explorations Newsletter

Web-based inference detection

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
A survey of modern authorship attribution methods

Journal of the American Society for Information Science and Technology
Authorship attribution and verification with many authors and limited data

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Measuring the interestingness of articles in a limited user environment

Information Processing and Management: an International Journal
Intrinsic plagiarism analysis

Language Resources and Evaluation
Authorship attribution in the wild

Language Resources and Evaluation
Authorship classification: a discriminative syntactic tree mining approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Author identification in bengali literary works

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
An information theoretic framework for web inference detection

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Web search query privacy: Evaluating query obfuscation and anonymizing networks

Journal of Computer Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we use a blog corpus to demonstrate that we can often identify the author of an anonymous text even where there are many thousands of candidate authors. Our approach combines standard information retrieval methods with a text categorization meta-learning scheme that determines when to even venture a guess.