Exploring linguistic features for web spam detection: a preliminary study

  • Authors:
  • Jakub Piskorski;Marcin Sydow;Dawid Weiss

  • Affiliations:
  • Joint Research Centre of the European Commission, Ispra, VA, Italy;Polish-Japanese Institute of Information Technology, Koszykowa, Warsaw, Poland;Poznań University of Technology, Piotrowo, Poznań, Poland

  • Venue:
  • AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, we make them publicly available for other researchers. Preliminary analysis seems to indicate that certain linguistic features may be useful for the spam-detection task when combined with features studied elsewhere.