Mining e-mail content for author identification forensics
ACM SIGMOD Record
Error Correcting Codes with Optimized Kullback-Leibler Distances for Text Categorization
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
The myth of the double-blind review?: author identification using only citations
ACM SIGKDD Explorations Newsletter
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Authorship attribution with thousands of candidate authors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Linguistic correlates of style: authorship classification with deep linguistic analysis features
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Ex-ray: Data mining and mental health
Applied Soft Computing
Searching with style: authorship attribution in classic literature
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
ACM Transactions on Information Systems (TOIS)
Tensor Space Models for Authorship Identification
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
A Web-Based Self-training Approach for Authorship Attribution
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Using the Web as corpus for self-training text categorization
Information Retrieval
A Cybercrime Forensic Method for Chinese Web Information Authorship Analysis
PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Authorship attribution and verification with many authors and limited data
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Exploiting Temporal Authors Interests via Temporal-Author-Topic Modeling
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Learning author-topic models from text corpora
ACM Transactions on Information Systems (TOIS)
Using some web content mining techniques for Arabic text classification
DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
Entropy-based authorship search in large document collections
ECIR'07 Proceedings of the 29th European conference on IR research
Authorship attribution via combination of evidence
ECIR'07 Proceedings of the 29th European conference on IR research
Authorship analysis in cybercrime investigation
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Text-based video content classification for online video-sharing sites
Journal of the American Society for Information Science and Technology
Authorship classification: a syntactic tree mining approach
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Authorship attribution using probabilistic context-free grammars
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Cuisine: Classification using stylistic feature sets and-or name-based feature sets
Journal of the American Society for Information Science and Technology
Automatic authorship attribution for texts in croatian language using combinations of features
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Citation author topic model in expert search
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Exploiting explicit semantics-based grouping for author interest finding
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Authorship classification: a discriminative syntactic tree mining approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Applying biometric principles to avatar recognition
Transactions on computational science XII
Authorship attribution using word sequences
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Effective and scalable authorship attribution using function words
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Using time topic modeling for semantics-based dynamic research interest finding
Knowledge-Based Systems
Using relative entropy for authorship attribution
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Applying authorship analysis to arabic web content
ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
On the assessment of text corpora
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Author gender identification from text
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Group topic modeling for academic knowledge discovery
Applied Intelligence
Expert Systems with Applications: An International Journal
Nonlinear transformation of term frequencies for term weighting in text categorization
Engineering Applications of Artificial Intelligence
Characterizing stylistic elements in syntactic structure
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Expert Systems with Applications: An International Journal
Syntactic dependency-based n-grams as classification features
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Simplified features for email authorship identification
International Journal of Security and Networks
Syntactic N-grams as machine learning features for natural language processing
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
In this paper we explore the use of text-mining methods for the identification of the author of a text. We apply the support vector machine (SVM) to this problem, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of a text. We performed a number of experiments with texts from a German newspaper. With nearly perfect reliability the SVM was able to reject other authors and detected the target author in 60–80% of the cases. In a second experiment, we ignored nouns, verbs and adjectives and replaced them by grammatical tags and bigrams. This resulted in slightly reduced performance. Author detection with SVMs on full word forms was remarkably robust even if the author wrote about different topics.