Agents that reduce work and information overload
Communications of the ACM
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
IEMS - The Intelligent Email Sorter
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Incremental Learning in SwiftFile
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Framework for Adaptive Mail Classification
ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Stream Data Management (The Kluwer International Series on Advances in Database Systems)
Stream Data Management (The Kluwer International Series on Advances in Database Systems)
New ensemble methods for evolving data streams
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Knowledge Discovery from Data Streams
Knowledge Discovery from Data Streams
The Journal of Machine Learning Research
GNUsmail: Open Framework for On-line Email Classification
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Hi-index | 0.00 |
Real-time email classification is a challenging task because of its online nature, subject to concept-drift. Identifying spam, where only two labels exist, has received great attention in the literature. We are nevertheless interested in classification involving multiple folders, which is an additional source of complexity. Moreover, neither cross-validation nor other sampling procedures are suitable for data streams evaluation. Therefore, other metrics, like the prequential error, have been proposed. However, the prequential error poses some problems, which can be alleviated by using mechanisms such as fading factors. In this paper we present GNUsmail, an open-source extensible framework for email classification, and focus on its ability to perform online evaluation. GNUsmail's architecture supports incremental and online learning, and it can be used to compare different online mining methods, using state-of-art evaluation metrics. We show how GNUsmail can be used to compare different algorithms, including a tool for launching replicable experiments.