Exploiting concept clumping for efficient incremental e-mail categorization

Authors:
Alfred Krzywicki;Wayne Wobcke
Affiliations:
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia;School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
Venue:
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Year:
2010

Citing 15
Cited 1

The weighted majority algorithm

Information and Computation
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning in the presence of concept drift and hidden contexts

Machine Learning
Tracking Context Changes through Meta-Learning

Machine Learning - Special issue on multistrategy learning
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Predictive Learning Models for Concept Drift

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Entropy-based Concept Shift Detection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining

Data Mining
Generating summary keywords for emails using topics

Proceedings of the 13th international conference on Intelligent user interfaces
A Large-Scale Evaluation of an E-mail Management Assistant

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Incremental E-Mail Classification and Rule Suggestion Using Simple Term Statistics

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Detecting concept drift using statistical testing

DS'07 Proceedings of the 10th international conference on Discovery science

Exploiting concept clumping for efficient incremental news article categorization

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a novel approach to incremental e-mail categorization based on identifying and exploiting "clumps" of messages that are classified similarly. Clumping reflects the local coherence of a classification scheme and is particularly important in a setting where the classification scheme is dynamically changing, such as in e-mail categorization. We propose a number of metrics to quantify the degree of clumping in a series of messages. We then present a number of fast, incremental methods to categorize messages and compare the performance of these methods with measures of the clumping in the datasets to show how clumping is being exploited by these methods. The methods are tested on 7 large real-world e-mail datasets of 7 users from the Enron corpus, where each message is classified into one folder. We show that our methods perform well and provide accuracy comparable to several common machine learning algorithms, but with much greater computational efficiency.