Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Authorship Attribution with Support Vector Machines
Applied Intelligence
Automatic text categorization in terms of genre and author
Computational Linguistics
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Tensor Space Models for Authorship Identification
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Authorship attribution and verification with many authors and limited data
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Forensic Authorship Attribution Using Compression Distances to Prototypes
IWCF '09 Proceedings of the 3rd International Workshop on Computational Forensics
The contribution of stylistic information to content-based mobile spam filtering
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Authorship classification: a syntactic tree mining approach
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Lost in translation: authorship attribution using frame semantics
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Authorship classification: a discriminative syntactic tree mining approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Authorship attribution with latent Dirichlet allocation
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Exploiting parse structures for native language identification
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
PAISI'10 Proceedings of the 2010 Pacific Asia conference on Intelligence and Security Informatics
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Mining writeprints from anonymous e-mails for forensic investigation
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Stylometric analysis of scientific articles
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Towards a model for replicating aesthetic literary appreciation
Proceedings of the Fifth Workshop on Semantic Web Information Management
Hi-index | 0.00 |
The identification of authorship falls into the category of style classification, an interesting sub-field of text categorization that deals with properties of the form of linguistic expression as opposed to the content of a text. Various feature sets and classification methods have been proposed in the literature, geared towards abstracting away from the content of a text, and focusing on its stylistic properties. We demonstrate that in a realistically difficult authorship attribution scenario, deep linguistic analysis features such as context free production frequencies and semantic relationship frequencies achieve significant error reduction over more commonly used "shallow" features such as function word frequencies and part of speech trigrams. Modern machine learning techniques like support vector machines allow us to explore large feature vectors, combining these different feature sets to achieve high classification accuracy in style-based tasks.