An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
Relational discriminant analysis
Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Machine Learning
Using Literal and Grammatical Statistics for Authorship Attribution
Problems of Information Transmission
Mining e-mail content for author identification forensics
ACM SIGMOD Record
Combining Fisher Linear Discriminants for Dissimilarity Representations
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
A repetition based measure for verification of text collections and for text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Guide to Biometrics
Style mining of electronic messages for multiple authorship discrimination: first results
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Applying Authorship Analysis to Extremist-Group Web Forum Messages
IEEE Intelligent Systems
Journal of the American Society for Information Science and Technology
From fingerprint to writeprint
Communications of the ACM - Supporting exploratory search
Linguistic correlates of style: authorship classification with deep linguistic analysis features
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Prototype selection for dissimilarity-based classifiers
Pattern Recognition
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Effective and scalable authorship attribution using function words
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Visualizing authorship for identification
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
A comparative study of language models for book and author recognition
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
On compression-based text classification
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Shared information and program plagiarism detection
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Local histograms of character N-grams for authorship attribution
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A weighted profile intersection measure for profile-based authorship attribution
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Hi-index | 0.01 |
In several situations authors prefer to hide their identity. In forensic applications, one can think of extortion and threats in emails and forum messages. These types of messages can easily be adjusted, such that meta data referring to names and addresses is at least unreliable. In this paper, we propose a method to identify authors of short informal messages solely based on the text content. The method uses compression distances between texts as features. Using these features a supervised classifier is learned on a training set of known authors. For the experiments, we prepared a dataset from Dutch newsgroup texts. We compared several state-of-the-art methods to our proposed method for the identification of messages from up to 50 authors. Our method clearly outperformed the other methods. In 65% of the cases the author could be correctly identified, while in 88% of the cases the true author was in the top 5 of the produced ranked list.