Exploiting partial decision trees for feature subset selection in e-mail categorization

Authors:
Helmut Berger;Dieter Merkl;Michael Dittenbach
Affiliations:
iSpaces Group, Electronic Commerce, Wien, Austria;Technische Universität Wien, Austria;iSpaces Group, Electronic Commerce, Wien, Austria
Venue:
Proceedings of the 2006 ACM symposium on Applied computing
Year:
2006

Citing 11
Cited 0

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Machine Learning

Machine Learning
Feature Subset Selection in Text-Learning

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
A comparison of text-categorization methods applied to n-gram frequency statistics

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we propose PARTfs which adopts a supervised machine learning algorithm, namely partial decision trees, as a method for feature subset selection. In particular, it is shown that an aggressive reduction of the feature space can be achieved with PARTfs while still allowing for comparable classification results with conventional feature selection metrics. The approach is empirically verified by employing two different document representations and four different text classification algorithms that are applied to a document collection consisting of personal e-mail messages. The results show that a reduction of the feature space in the magnitude of ten is achievable without loss of classification accuracy.