An HMM for detecting spam mail

Authors:
José Gordillo;Eduardo Conde
Affiliations:
Facultad de Matemáticas, Department of Statistics and Operations Research, Universidad de Sevilla, CP 41012 C/Tarfia S/N, Sevilla, Spain;Facultad de Matemáticas, Department of Statistics and Operations Research, Universidad de Sevilla, CP 41012 C/Tarfia S/N, Sevilla, Spain
Venue:
Expert Systems with Applications: An International Journal
Year:
2007

Citing 5
Cited 6

Fundamentals of speech recognition

Fundamentals of speech recognition
Training Hidden Markov Models with Multiple Observations-A Combinatorial Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bioinformatics: the machine learning approach

Bioinformatics: the machine learning approach
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Bioinformatics and Management Science: Some Common Tools and Techniques

Operations Research

Detection of cloaked web spam by using tag-based methods

Expert Systems with Applications: An International Journal
PM2.5 concentration prediction using hidden semi-Markov model-based times series data mining

Expert Systems with Applications: An International Journal
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Using GMDH-based networks for improved spam detection and email feature analysis

Applied Soft Computing
A hybrid approach of HMM and grey model for age-dependent health prediction of engineering assets

Expert Systems with Applications: An International Journal
Segmental parameterisation and statistical modelling of e-mail headers for spam detection

Information Sciences: an International Journal

Quantified Score

Hi-index	12.06

Visualization

Abstract

Hidden Markov Models, or HMMs for short, have been recently used in Bioinformatics for the classification of DNA or protein chains, giving rise to what is known as Profile Hidden Markov Models. In this paper, we show that these models can also be adapted to the problem of classifying misspelled words by identifying its primary structure through statistical tools. This process leads to a new learning algorithm which is based in the parametrization of the set of recognizable words in order to detect any misspelled form of these words. As an application, a method to classify spam mails by means of the detection of the adulterated words, from a blacklist of words frequently used by spammers, is described.