C4.5: programs for machine learning
C4.5: programs for machine learning
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Threading electronic mail: a preliminary study
Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Smokey: automatic recognition of hostile messages
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
A Text Mining Agents Based Architecture for Personal E-mail Filtering and Management
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
k-NN Aggregation with a Stacked Email Representation
ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
A scalable intelligent non-content-based spam-filtering framework
Expert Systems with Applications: An International Journal
Automatically tagging email by leveraging other users' folders
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
On effective e-mail classification via neural networks
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Classifying e-mails via support vector machine
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
NASC: a novel approach for spam classification
ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Automated crime report analysis and classification for e-government and decision support
Proceedings of the 14th Annual International Conference on Digital Government Research
Hi-index | 0.00 |
This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, Email messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, a comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust.