Journal of the American Society for Information Science and Technology
Authorship attribution with thousands of candidate authors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring Differentiability: Unmasking Pseudonymous Authors
The Journal of Machine Learning Research
ACM Transactions on Information Systems (TOIS)
Foundations and Trends in Information Retrieval
Computational methods in authorship attribution
Journal of the American Society for Information Science and Technology
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Authorship attribution and verification with many authors and limited data
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Effective and scalable authorship attribution using function words
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Lost in translation: authorship attribution using frame semantics
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Authorship attribution with latent Dirichlet allocation
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Authorship attribution with author-aware topic models
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
On the role of poetic versus nonpoetic features in “kindred” and diachronic poetry attribution
Journal of the American Society for Information Science and Technology
Expert Systems with Applications: An International Journal
Open-Set classification for automated genre identification
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Explanation in computational stylometry
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
The use of orthogonal similarity relations in the prediction of authorship
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Syntactic dependency-based n-grams: more evidence of usefulness in classification
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Syntactic dependency-based n-grams as classification features
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Syntactic N-grams as machine learning features for natural language processing
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Most previous work on authorship attribution has focused on the case in which we need to attribute an anonymous document to one of a small set of candidate authors. In this paper, we consider authorship attribution as found in the wild: the set of known candidates is extremely large (possibly many thousands) and might not even include the actual author. Moreover, the known texts and the anonymous texts might be of limited length. We show that even in these difficult cases, we can use similarity-based methods along with multiple randomized feature sets to achieve high precision. Moreover, we show the precise relationship between attribution precision and four parameters: the size of the candidate set, the quantity of known-text by the candidates, the length of the anonymous text and a certain robustness score associated with a attribution.