e-mail authentication system: a spam filtering for smart senders
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Hi-index | 0.01 |
Collocation is the frequent bi-grams of semantic meanings and grammatical functions. Adjacent and long distance collocations are extracted as features for a Bayesian classifier in spam filtering. Compared to the common unigram feature, collocation-based classifier shows improvement in all the evaluation metrics. The influence of mail header information is studied for the classifier, which shows a 10% change in both precision and recall.