Dynamically weighted hidden Markov model for spam deobfuscation

Authors:
Seunghak Lee;Iryoung Jeong;Seungjin Choi
Affiliations:
Department of Chemistry, POSTECH, Korea;Department of Computer Science Education, Korea University, Korea;Department of Computer Science, POSTECH, Korea
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 3
Cited 3

Statistical methods for speech recognition

Statistical methods for speech recognition
An HMM-Based Threshold Model Approach for Gesture Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
The N-Best algorithm: an efficient procedure for finding top N sentence hypotheses

HLT '89 Proceedings of the workshop on Speech and Natural Language

Email Spam Filtering: A Systematic Review

Foundations and Trends in Information Retrieval
Secured and trusted three-tier grid architecture

International Journal of Ad Hoc and Ubiquitous Computing
A rule-based system for end-user e-mail annotations

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam deobfuscation is a processing to detect obfuscated words appeared in spam emails and to convert them back to the original words for correct recognition. Lexicon tree hidden Markov model (LTHMM) was recently shown to be useful in spam deobfuscation. However, LT-HMM suffers from a huge number of states, which is not desirable for practical applications. In this paper we present a complexity-reduced HMM, referred to as dynamically weighted HMM (DW-HMM) where the states involving the same emission probability are grouped into super-states, while preserving state transition probabilities of the original HMM. DWHMM dramatically reduces the number of states and its state transition probabilities are determined in the decoding phase. We illustrate how we convert a LT-HMM to its associated DW-HMM. We confirm the useful behavior of DW-HMM in the task of spam deobfuscation, showing that it significantly reduces the number of states while maintaining the high accuracy.