Improved bayesian spam filtering based on co-weighted multi-area information

  • Authors:
  • Raju Shrestha;Yaping Lin

  • Affiliations:
  • Department of Computer and Communication, Hunan University, Changsha, P.R. China;Department of Computer and Communication, Hunan University, Changsha, P.R. China

  • Venue:
  • PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bayesian spam filters, in general, compute probability estimations for tokens either without considering the email areas of occurrences except the body or treating the same token occurred in different areas as different tokens. However, in reality the same token occurring in different areas are inter-related and the relation too could play role in the classification. In this paper we incorporated this novel idea, co-relating multi-area information by co-weighting them and obtaining more effective combined integrated probability estimations for tokens. The new approach is compared with individual area-wise estimations and traditional separate estimations in all areas, and the experimental results with three public corpora showed significant improvement, stability, robustness and consistency in the spam filtering with the proposed estimation.