Foundations of statistical natural language processing
Foundations of statistical natural language processing
Composite Kernels for Hypertext Categorisation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Authorship Attribution with Support Vector Machines
Applied Intelligence
The Journal of Machine Learning Research
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Linguistic correlates of style: authorship classification with deep linguistic analysis features
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
On transforming statistical models for non-frontal face verification
Pattern Recognition
Gene extraction for cancer diagnosis by support vector machines
ICANN'05 Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Authorship classification: a syntactic tree mining approach
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Language Resources and Evaluation
Authorship classification: a discriminative syntactic tree mining approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Predicting age and gender in online social networks
Proceedings of the 3rd international workshop on Search and mining user-generated contents
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Explanation in computational stylometry
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
We present an investigation of recently proposed character and word sequence kernels for the task of authorship attribution based on relatively short texts. Performance is compared with two corresponding probabilistic approaches based on Markov chains. Several configurations of the sequence kernels are studied on a relatively large dataset (50 authors), where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, the amount of training material has more influence on discrimination performance than the amount of test material. Moreover, we show that the recently proposed author unmasking approach is less useful when dealing with short texts.