The Journal of Machine Learning Research
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
Extreme re-balancing for SVMs: a case study
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Linguistic correlates of style: authorship classification with deep linguistic analysis features
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Foundations and Trends in Information Retrieval
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Authorship attribution and verification with many authors and limited data
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Learning author-topic models from text corpora
ACM Transactions on Information Systems (TOIS)
A hierarchical classifier applied to multi-way sentiment detection
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Authorship attribution in the wild
Language Resources and Evaluation
Personalised rating prediction for new users using latent factor models
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Collaborative inference of sentiments from texts
UMAP'10 Proceedings of the 18th international conference on User Modeling, Adaptation, and Personalization
Stylometric relevance-feedback towards a hybrid book recommendation algorithm
Proceedings of the fifth ACM workshop on Research advances in large digital book repositories and complementary media
Authorship attribution with author-aware topic models
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Authorship attribution based on a probabilistic topic model
Information Processing and Management: an International Journal
Hi-index | 0.00 |
The problem of authorship attribution -- attributing texts to their original authors -- has been an active research area since the end of the 19th century, attracting increased interest in the last decade. Most of the work on authorship attribution focuses on scenarios with only a few candidate authors, but recently considered cases with tens to thousands of candidate authors were found to be much more challenging. In this paper, we propose ways of employing Latent Dirichlet Allocation in authorship attribution. We show that our approach yields state-of-the-art performance for both a few and many candidate authors, in cases where these authors wrote enough texts to be modelled effectively.