Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Authorship Attribution with Support Vector Machines
Applied Intelligence
Style mining of electronic messages for multiple authorship discrimination: first results
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Authorship attribution with thousands of candidate authors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Memory-Based Language Processing (Studies in Natural Language Processing)
Memory-Based Language Processing (Studies in Natural Language Processing)
Linguistic profiling for author recognition and verification
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Linguistic correlates of style: authorship classification with deep linguistic analysis features
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Stylistic text classification using functional lexical features: Research Articles
Journal of the American Society for Information Science and Technology
Author Identification Using Imbalanced and Limited Training Texts
DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Measuring Differentiability: Unmasking Pseudonymous Authors
The Journal of Machine Learning Research
Authorship attribution using word sequences
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Particle Swarm Model Selection for Authorship Verification
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Authorship attribution using probabilistic context-free grammars
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Authorship attribution in the wild
Language Resources and Evaluation
Authorship classification: a discriminative syntactic tree mining approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Authorship attribution with latent Dirichlet allocation
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A weighted profile intersection measure for profile-based authorship attribution
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Using psycholinguistic features for profiling first language of authors
Journal of the American Society for Information Science and Technology
On the use of homogenous sets of subjects in deceptive language analysis
EACL 2012 Proceedings of the Workshop on Computational Approaches to Deception Detection
Characterizing stylistic elements in syntactic structure
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
The use of orthogonal similarity relations in the prediction of authorship
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.01 |
Most studies in statistical or machine learning based authorship attribution focus on two or a few authors. This leads to an overestimation of the importance of the features extracted from the training data and found to be discriminating for these small sets of authors. Most studies also use sizes of training data that are unrealistic for situations in which stylometry is applied (e.g., forensics), and thereby overestimate the accuracy of their approach in these situations. A more realistic interpretation of the task is as an authorship verification problem that we approximate by pooling data from many different authors as negative examples. In this paper, we show, on the basis of a new corpus with 145 authors, what the effect is of many authors on feature selection and learning, and show robustness of a memory-based learning approach in doing authorship attribution and verification with many authors and limited training data when compared to eager learning methods such as SVMs and maximum entropy learning.