Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Mining e-mail content for author identification forensics
ACM SIGMOD Record
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Authorship Attribution with Support Vector Machines
Applied Intelligence
A repetition based measure for verification of text collections and for text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Style mining of electronic messages for multiple authorship discrimination: first results
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
Automatic text categorization in terms of genre and author
Computational Linguistics
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Applying Authorship Analysis to Extremist-Group Web Forum Messages
IEEE Intelligent Systems
Journal of the American Society for Information Science and Technology
From fingerprint to writeprint
Communications of the ACM - Supporting exploratory search
Extracting key-substring-group features for text classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Journal of the American Society for Information Science and Technology
Linguistic correlates of style: authorship classification with deep linguistic analysis features
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Stylistic text classification using functional lexical features: Research Articles
Journal of the American Society for Information Science and Technology
Author Identification Using Imbalanced and Limited Training Texts
DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Measuring Differentiability: Unmasking Pseudonymous Authors
The Journal of Machine Learning Research
Author identification: Using text sampling to handle the class imbalance problem
Information Processing and Management: an International Journal
Authorship attribution using word sequences
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
N-Gram feature selection for authorship identification
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
On compression-based text classification
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
A novel split and merge technique for hypertext classification
Transactions on rough sets XII
Local histograms of character N-grams for authorship attribution
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Tensor Framework and Combined Symmetry for Hypertext Mining
Fundamenta Informaticae
A new document author representation for authorship attribution
MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
The use of orthogonal similarity relations in the prediction of authorship
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
Authorship identification can be viewed as a text categorization task. However, in this task the most frequent features appear to be the most important discriminators, there is usually a shortage of training texts, and the training texts are rarely evenly distributed over the authors. To cope with these problems, we propose tensors of second order for representing the stylistic properties of texts. Our approach requires the calculation of much fewer parameters in comparison to the traditional vector space representation. We examine various methods for building appropriate tensors taking into account that similar features should be placed in the same neighborhood. Based on an existing generalization of SVM able to handle tensors we perform experiments on corpora controlled for genre and topic and show that the proposed approach can effectively handle cases where only limited training texts are available.