On effective e-mail classification via neural networks

Authors:
Bin Cui;Anirban Mondal;Jialie Shen;Gao Cong;Kian-Lee Tan
Affiliations:
Singapore-MIT Alliance, National University of Singapore;University of Tokyo, Japan;University of New South Wales, Australia;The University of Edinburgh, UK;Singapore-MIT Alliance, National University of Singapore
Venue:
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Year:
2005

Citing 7
Cited 3

Digital neural networks

Digital neural networks
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
A Comparative Study of Classification Based Personal E-mail Filtering

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Using latent semantic indexing to filter spam

Proceedings of the 2003 ACM symposium on Applied computing
"In vivo" spam filtering: a challenge problem for KDD

ACM SIGKDD Explorations Newsletter

Automatic thesaurus construction for spam filtering using revised back propagation neural network

Expert Systems with Applications: An International Journal
An Intelligent Automatic Hoax Detection System

KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part I
Classifying e-mails via support vector machine

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

For addressing the growing problem of junk E-mail on the Internet, this paper proposes an effective E-mail classifying and cleansing method in this paper. Incidentally, E-mail messages can be modelled as semi-structured documents consisting of a set of fields with pre-defined semantics and a number of variable length free-text fields. Our proposed method deals with both fields having pre-defined semantics as well as variable length free-text fields for obtaining higher accuracy. The main contributions of this work are two-fold. First, we present a new model based on the Neural Network (NN) for classifying personal E-mails. In particular, we treat E-mail files as a particular kind of plain text files, the implication being that our feature set is relatively large (since there are thousands of different terms in different E-mail files). Second, we propose the use of Principal Component Analysis (PCA) as a preprocessor of NN to reduce the data in terms of both size as well as dimensionality so that the input data become more classifiable and faster for the convergence of the training process used in the NN model. The results of our performance evaluation demonstrate that the proposed algorithm is indeed effective in performing filtering with reasonable accuracy.