Application of Collocation to Spam Filtering

  • Authors:
  • Jing Zhang;Jianmin Yao;Shoubin Dong;Ling Zhang

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ETTANDGRS '08 Proceedings of the 2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing - Volume 02
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Collocation is the frequent bi-grams of semantic meanings and grammatical functions. Adjacent and long distance collocations are extracted as features for a Bayesian classifier in spam filtering. Compared to the common unigram feature, collocation-based classifier shows improvement in all the evaluation metrics. The influence of mail header information is studied for the classifier, which shows a 10% change in both precision and recall.