Using Literal and Grammatical Statistics for Authorship Attribution

Authors:
O. V. Kukushkina;A. A. Polikarpov;D. V. Khmelev
Affiliations:
-;-;-
Venue:
Problems of Information Transmission
Year:
2001

Citing 2
Cited 17

Data compression using dynamic Markov modelling

The Computer Journal
An introduction to Kolmogorov complexity and its applications (2nd ed.)

An introduction to Kolmogorov complexity and its applications (2nd ed.)

Joint Matrix Universal Coding of Sequences of Independent Symbols

Problems of Information Transmission
Zipping Out Relevant Information

Computing in Science and Engineering
A repetition based measure for verification of text collections and for text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Experimental investigation of forecasting methods based on data compression algorithms

Problems of Information Transmission
Authorship attribution

Foundations and Trends in Information Retrieval
Sublinear Algorithms for Approximating String Compressibility

APPROX '07/RANDOM '07 Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization. Algorithms and Techniques
A survey of modern authorship attribution methods

Journal of the American Society for Information Science and Technology
Forensic Authorship Attribution Using Compression Distances to Prototypes

IWCF '09 Proceedings of the 3rd International Workshop on Computational Forensics
Capturing expression using linguistic information

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Author attribution of Turkish texts by feature mining

ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Automatic authorship attribution for texts in croatian language using combinations of features

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
A comparative study of language models for book and author recognition

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
On compression-based text classification

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Authorship attribution of texts: a review

General Theory of Information Transfer and Combinatorics
Statistical recognition of a set of patterns using novel probability neural network

ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Legal documents categorization by compression

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
Probabilistic neural network with homogeneity testing in recognition of discrete patterns set

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov chains are used as a formal mathematical model for sequences of elements of a text. This model is applied for authorship attribution of texts. As elements of a text, we consider sequences of letters or sequences of grammatical classes of words. It turns out that the frequencies of occurrences of letter pairs and pairs of grammatical classes in a Russian text are rather stable characteristics of an author and, apparently, they could be used in disputed authorship attribution. A comparison of results for various modifications of the method using both letters and grammatical classes is given. Experimental research involves 385 texts of 82 writers. In the Appendix, the research of D.V. Khmelev is described, where data compression algorithms are applied to authorship attribution.