Tensor Space Models for Authorship Identification

Authors:
Spyridon Plakias;Efstathios Stamatatos
Affiliations:
Dept. of Information and Communication Systems Eng., University of the Aegean, Karlovassi, Greece 83200;Dept. of Information and Communication Systems Eng., University of the Aegean, Karlovassi, Greece 83200
Venue:
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Year:
2008

Citing 22
Cited 5

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Mining e-mail content for author identification forensics

ACM SIGMOD Record
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Authorship Attribution with Support Vector Machines

Applied Intelligence
A repetition based measure for verification of text collections and for text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Style mining of electronic messages for multiple authorship discrimination: first results

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Augmenting Naive Bayes Classifiers with Statistical Language Models

Information Retrieval
Automatic text categorization in terms of genre and author

Computational Linguistics
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Applying Authorship Analysis to Extremist-Group Web Forum Messages

IEEE Intelligent Systems
A framework for authorship identification of online messages: Writing-style features and classification techniques

Journal of the American Society for Information Science and Technology
From fingerprint to writeprint

Communications of the ACM - Supporting exploratory search
Extracting key-substring-group features for text classification

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature instability as a criterion for selecting potential style markers: Special Topic Section on Computational Analysis of Style

Journal of the American Society for Information Science and Technology
Linguistic correlates of style: authorship classification with deep linguistic analysis features

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Stylistic text classification using functional lexical features: Research Articles

Journal of the American Society for Information Science and Technology
Author Identification Using Imbalanced and Limited Training Texts

DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Measuring Differentiability: Unmasking Pseudonymous Authors

The Journal of Machine Learning Research
Author identification: Using text sampling to handle the class imbalance problem

Information Processing and Management: an International Journal
Authorship attribution using word sequences

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
N-Gram feature selection for authorship identification

AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
On compression-based text classification

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

A novel split and merge technique for hypertext classification

Transactions on rough sets XII
Local histograms of character N-grams for authorship attribution

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Tensor Framework and Combined Symmetry for Hypertext Mining

Fundamenta Informaticae
A new document author representation for authorship attribution

MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
The use of orthogonal similarity relations in the prediction of authorship

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Authorship identification can be viewed as a text categorization task. However, in this task the most frequent features appear to be the most important discriminators, there is usually a shortage of training texts, and the training texts are rarely evenly distributed over the authors. To cope with these problems, we propose tensors of second order for representing the stylistic properties of texts. Our approach requires the calculation of much fewer parameters in comparison to the traditional vector space representation. We examine various methods for building appropriate tensors taking into account that similar features should be placed in the same neighborhood. Based on an existing generalization of SVM able to handle tensors we perform experiments on corpora controlled for genre and topic and show that the proposed approach can effectively handle cases where only limited training texts are available.