A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis

  • Authors:
  • A. Bhattarai;D. Dasgupta

  • Affiliations:
  • University of Memphis, USA;University of Memphis, USA

  • Venue:
  • International Journal of Information Security and Privacy
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the problems and threats posed by a type of spam in the blogosphere, called blog comment spam. It explores the challenges introduced by comment spam, generalizing the analysis substantially to any other short text type spam. The authors analyze different high-level features of spam and legitimate comments based on the content of blog postings. The authors use these features to cluster data separately for each feature using K-Means clustering algorithm. The authors also use self-supervised learning, which could classify spam and legitimate comments automatically. Compared with existing solutions, this approach demonstrates more flexibility and adaptability to the environment, as it requires minimal human intervention. The preliminary evaluation of the proposed spam detection system shows promising results.