eMailSift: Email Classification Based on Structure and Content

  • Authors:
  • Manu Aery;Sharma Chakravarthy

  • Affiliations:
  • University of Texas at Arlington;University of Texas at Arlington

  • Venue:
  • ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.02

Visualization

Abstract

In this paper we propose a novel approach that uses structure as well as the content of emails in a folder for email classification. Our approach is based on the premise that representative — common and recurring — structures/patterns can be extracted from a pre-classified email folder and the same can be used effectively for classifying incoming emails. A number of factors that influence representative structure extraction and the classification are analyzed conceptually and validated experimentally. In our approach, the notion of inexact graph match is leveraged for deriving structures that provide coverage for characterizing folder contents. Extensive experimentation validate the selection of parameters and the effectiveness of our approach for email classification.