The weighted majority algorithm
Information and Computation
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning in the presence of concept drift and hidden contexts
Machine Learning
Tracking Context Changes through Meta-Learning
Machine Learning - Special issue on multistrategy learning
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive Learning Models for Concept Drift
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Entropy-based Concept Shift Detection
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining
Generating summary keywords for emails using topics
Proceedings of the 13th international conference on Intelligent user interfaces
A Large-Scale Evaluation of an E-mail Management Assistant
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Incremental E-Mail Classification and Rule Suggestion Using Simple Term Statistics
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Detecting concept drift using statistical testing
DS'07 Proceedings of the 10th international conference on Discovery science
Exploiting concept clumping for efficient incremental news article categorization
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Hi-index | 0.00 |
We introduce a novel approach to incremental e-mail categorization based on identifying and exploiting "clumps" of messages that are classified similarly. Clumping reflects the local coherence of a classification scheme and is particularly important in a setting where the classification scheme is dynamically changing, such as in e-mail categorization. We propose a number of metrics to quantify the degree of clumping in a series of messages. We then present a number of fast, incremental methods to categorize messages and compare the performance of these methods with measures of the clumping in the datasets to show how clumping is being exploited by these methods. The methods are tested on 7 large real-world e-mail datasets of 7 users from the Enron corpus, where each message is classified into one folder. We show that our methods perform well and provide accuracy comparable to several common machine learning algorithms, but with much greater computational efficiency.