Using latent semantic indexing to filter spam

Authors:
Kevin R. Gee
Affiliations:
The University of Texas at Arlington, Arlington, TX
Venue:
Proceedings of the 2003 ACM symposium on Applied computing
Year:
2003

Citing 2
Cited 24

Support vector machines and Kernel methods: the new generation of learning machines

AI Magazine
Machine learning in automated text categorisation

Machine learning in automated text categorisation

An Assessment of Case-Based Reasoning for Spam Filtering

Artificial Intelligence Review
SF-HME system: a hierarchical mixtures-of-experts classification system for spam filtering

Proceedings of the 2006 ACM symposium on Applied computing
Applying lazy learning algorithms to tackle concept drift in spam filtering

Expert Systems with Applications: An International Journal
Web searching, search engines and Information Retrieval

Information Services and Use
Analyzing UCE/UBE traffic

Proceedings of the ninth international conference on Electronic commerce
Time-efficient spam e-mail filtering using n-gram models

Pattern Recognition Letters
Email Spam Filtering: A Systematic Review

Foundations and Trends in Information Retrieval
An empirical study of required dimensionality for large-scale latent semantic indexing applications

Proceedings of the 17th ACM conference on Information and knowledge management
Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Journal of Computer Security
Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Journal of Computer Security - Best papers of the Sec Track at the 2006 ACM Symposium
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Supervised latent semantic indexing using adaptive sprinkling

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A case-based technique for tracking concept drift in spam filtering

Knowledge-Based Systems
A semantic self-organising webpage-ranking algorithm using computational geometry across different knowledge domains

International Journal of Knowledge and Web Intelligence
Application of genetic optimized artificial immune system and neural networks in spam detection

Applied Soft Computing
PCA document reconstruction for email classification

Computational Statistics & Data Analysis
Streaming sparse matrix compression/decompression

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
On effective e-mail classification via neural networks

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
User action based adaptive learning with weighted bayesian classification for filtering spam mail

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Sprinkling: supervised latent semantic indexing

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
SDAI: An integral evaluation methodology for content-based spam filtering models

Expert Systems with Applications: An International Journal
Grindstone4Spam: An optimization toolkit for boosting e-mail classification

Journal of Systems and Software
Genetic optimized artificial immune system in spam detection: a review and a model

Artificial Intelligence Review
Concept drift detection via competence models

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Past research has explored the effectiveness of a Naïve Bayesian classifier when filtering unsolicited bulk email (spam). Results have shown that the degree of precision of this approach is generally superior to the degree of recall. This study evaluates the effectiveness of a classifier incorporating Latent Semantic Indexing (LSI) to filter spam email on corpus used in previous studies. Results show that email classifiers using LSI to filter spam enjoy a very high degree of both recall and precision, no matter if the corpus is treated using a stop list or a lemmatizer. While using LSI leads to precision roughly equal to that of using a Naïve Bayesian approach, the LSI technique has a substantially higher recall and is more effective under certain conditions.Results show that incorporating LSI into an anti-spam filter is viable, particularly in implementations when misclassified legitimate messages are not arbitrarily deleted. Other inferences are drawn to the applicability of this method to other text mining tasks.