Automatically generated spam detection based on sentence-level topic information

  • Authors:
  • Yoshihiko Suhara;Hiroyuki Toda;Shuichi Nishioka;Seiji Susaki

  • Affiliations:
  • NTT Service Evolution Laboratories, NTT Corporation, Yokosuka-shi, Kanagawa, Japan;NTT Service Evolution Laboratories, NTT Corporation, Yokosuka-shi, Kanagawa, Japan;NTT Service Evolution Laboratories, NTT Corporation, Yokosuka-shi, Kanagawa, Japan;NTT Service Evolution Laboratories, NTT Corporation, Yokosuka-shi, Kanagawa, Japan

  • Venue:
  • Proceedings of the 22nd international conference on World Wide Web companion
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spammers use a wide range of content generation techniques with low quality pages known as content spam to achieve their goals. We argue that content spam must be tackled using a wide range of content quality features. In this paper, we propose novel sentence-level diversity features based on the probabilistic topic model. We combine them with other content features to build a content spam classifier. Our experiments show that our method outperforms the conventional methods.