k-NN Aggregation with a Stacked Email Representation

  • Authors:
  • Amandine Orecchioni;Nirmalie Wiratunga;Stewart Massie;Susan Craw

  • Affiliations:
  • School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK AB25 1HG;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK AB25 1HG;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK AB25 1HG;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK AB25 1HG

  • Venue:
  • ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The variety in email related tasks, as well as the increase in daily email load, has created a need for automated email management tools. In this paper, we provide an empirical evaluation of representational schemes and retrieval strategies for email. In particular, we study the impact of both textual and non-textual email content for case representation applied to Email task management. Our first contribution is Stack, an email representation based on stacking. Multiple casebases are created, each using a different case representation related with attributes corresponding to semi-structured email content. A k-NN classifier is applied to each casebase and the output is used to form a new case representation. Our second contribution is a new evaluation method allowing the creation of random chronological stratified train-test trials that respect both temporal and class distribution aspects, crucial for the email domain. The Enron corpus was used to create a dataset for the email deletion prediction task. Evaluation results show significant improvements with Stackover single casebase retrieval and multiple casebases retrieval combined using majority vote.