A suffix tree approach to anti-spam email filtering

Authors:
Rajesh Pampapathi;Boris Mirkin;Mark Levene
Affiliations:
School of Computer Science and Information Systems, Birkbeck College, University of London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London;School of Computer Science and Information Systems, Birkbeck College, University of London, London
Venue:
Machine Learning
Year:
2006

Citing 12
Cited 6

The nature of statistical learning theory

The nature of statistical learning theory
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
An algorithm for suffix stripping

Readings in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Reducing the space requirement of suffix trees

Software—Practice & Experience
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Naive Bayesian Classification of Structured Data

Machine Learning
Text Mining: Predictive Methods for Analyzing Unstructured Information

Text Mining: Predictive Methods for Analyzing Unstructured Information
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
A comparison of event models for Naive Bayes anti-spam e-mail filtering

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1

A new suffix tree similarity measure for document clustering

Proceedings of the 16th international conference on World Wide Web
Spam Filtering Using Statistical Data Compression Models

The Journal of Machine Learning Research
Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks

Expert Systems with Applications: An International Journal
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Adaptive context modeling for deception detection in emails

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Collective suffix tree-based models for location prediction

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an approach to email filtering based on the suffix tree data structure. A method for the scoring of emails using the suffix tree is developed and a number of scoring and score normalisation functions are tested. Our results show that the character level representation of emails and classes facilitated by the suffix tree can significantly improve classification accuracy when compared with the currently popular methods, such as naive Bayes. We believe the method can be extended to the classification of documents in other domains.