A text-independent speaker recognition system based on vowel spotting
Speech Communication
A Practical Chunker for Unrestricted Text
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Text genre detection using common word frequencies
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Language independent authorship attribution using character level language models
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Segmenting documents by stylistic character
Natural Language Engineering
Searching with style: authorship attribution in classic literature
ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
ACM Transactions on Information Systems (TOIS)
Flexible document categorisation
AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
Author attribution of Turkish texts by feature mining
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Authorship attribution using probabilistic context-free grammars
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Applying biometric principles to avatar recognition
Transactions on computational science XII
Effective and scalable authorship attribution using function words
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Using relative entropy for authorship attribution
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
In this paper we present an approach to automatic authorship attribution dealing with real-world (or unrestricted) text. Our method is based on the computational analysis of the input text using a text-processing tool. Besides the style markes relevant to the output of this tool we also use analysis-dependent style markers, that is, measures that represent the way in which the text has been processed. No word frequency counts, nor other lexically-based measures are taken into account. We show that the proposed set of style markers is able to distinguish texts of various authors of a weekly newspaper using multiple regression. All the experiments we present were performed using real-world text downloaded from the World Wide Web. Our approach is easily trainable and fully-automated requiring no manual text preprocessing nor sampling.