Applying authorship analysis to arabic web content

  • Authors:
  • Ahmed Abbasi;Hsinchun Chen

  • Affiliations:
  • Department of Management Information Systems, The University of Arizona, Tucson, AZ;Department of Management Information Systems, The University of Arizona, Tucson, AZ

  • Venue:
  • ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world's social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset.