Detection of different authorship of text sequences through self-organizing maps and mutual information function

  • Authors:
  • Antonio Neme;Blanca Lugo;Alejandra Cervera

  • Affiliations:
  • Complex systems and nonlinear dynamics group, Universidad Autónoma de la Ciudad de México, México, D.F., México;Facultad de Ciencias, Universidad Autónoma del Estado de Coahuila, Saltillo, Coah, México;Comisión Nacional para el uso y conocimiento de la biodiversidad, México

  • Venue:
  • MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Writers tend to express their ideas with different styles, defined with the so called firm or stylome, which is an abstraction of the general constraints and specific combinations of words within their language they decide to follow. Although capturing this style has proven to be very difficult, some advances have been achieved. Here, we present a novel system that is trained with texts from the same author, and is able to unveil some of its features, and to apply them to detect texts not written by the same author, or, at least, not written with the previously learned features. The system is an hybrid model based in self-organizing maps and in information-theoretic aspects. In the model, mutual information function of unknown texts are compared to the mutual information function of texts from a known author. If the distance between these two distributions exceeds a certain threshold, then the unknown text is from a different author, otherwise the authorship is the same. The decision threshold is obtained by the self-organizing map trained with the texts from the same author. We present results in authorship identification in several contexts including classic literature, journalism (political, economical, sports), and scientific divulgation.