Exploiting partial decision trees for feature subset selection in e-mail categorization

  • Authors:
  • Helmut Berger;Dieter Merkl;Michael Dittenbach

  • Affiliations:
  • iSpaces Group, Electronic Commerce, Wien, Austria;Technische Universität Wien, Austria;iSpaces Group, Electronic Commerce, Wien, Austria

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we propose PARTfs which adopts a supervised machine learning algorithm, namely partial decision trees, as a method for feature subset selection. In particular, it is shown that an aggressive reduction of the feature space can be achieved with PARTfs while still allowing for comparable classification results with conventional feature selection metrics. The approach is empirically verified by employing two different document representations and four different text classification algorithms that are applied to a document collection consisting of personal e-mail messages. The results show that a reduction of the feature space in the magnitude of ten is achievable without loss of classification accuracy.