Elements of information theory
Elements of information theory
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Using Literal and Grammatical Statistics for Authorship Attribution
Problems of Information Transmission
Using the Web as corpus for self-training text categorization
Information Retrieval
A new approach for cross-language plagiarism analysis
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Biometrics: An Overview on New Technologies and Ethic Problems
International Journal of Technoethics
Hi-index | 0.00 |
We survey the authorship attribution of documents given some prior stylistic characteristics of the author's writing extracted from a corpus of known works, e.g., authentication of disputed documents or literary works. Although the pioneering paper based on word length histograms appeared at the very end of the nineteenth century, the resolution power of this and other stylometry approaches is yet to be studied both theoretically and on case studies such that additional information can assist finding the correct attribution. We survey several theoretical approaches including ones approximating the apparently nearly optimal one based on Kolmogorov conditional complexity and some case studies: attributing Shakespeare canon and newly discovered works as well as allegedly M. Twain's newly-discovered works, Federalist papers binary (Madison vs. Hamilton) discrimination using Naive Bayes and other classifiers, and steganography presence testing. The latter topic is complemented by a sketch of an anagrams ambiguity study based on the Shannon cryptography theory.