Automatically classifying emails into activities

  • Authors:
  • Mark Dredze;Tessa Lau;Nicholas Kushmerick

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA;IBM Almaden Research Center, San Jose, CA;University College Dublin, Dublin, Ireland

  • Venue:
  • Proceedings of the 11th international conference on Intelligent user interfaces
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Email-based activity management systems promise to give users better tools for managing increasing volumes of email, by organizing email according to a user's activities. Current activity management systems do not automatically classify incoming messages by the activity to which they belong, instead relying on simple heuristics (such as message threads), or asking the user to manually classify incoming messages as belonging to an activity. This paper presents several algorithms for automatically recognizing emails as part of an ongoing activity. Our baseline methods are the use of message reply-to threads to determine activity membership and a naïve Bayes classifier. Our SimSubset and SimOverlap algorithms compare the people involved in an activity against the recipients of each incoming message. Our SimContent algorithm uses IRR (a variant of latent semantic indexing) to classify emails into activities using similarity based on message contents. An empirical evaluation shows that each of these methods provide a significant improvement to the baseline methods. In addition, we show that a combined approach that votes the predictions of the individual methods performs better than each individual method alone.