Detecting word substitutions: PMI vs. HMM

Authors:
Dmitri Roussinov;SzeWang Fong;David Skillicorn
Affiliations:
Arizona State University, Tempe, AZ;Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 4
Cited 0

Factored language models and generalized parallel backoff

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Mining context specific similarity relationships using the world wide web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Measures to detect word substitution in intercepted communication

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Beyond keyword filtering for message and conversation detection

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Those who want to conceal the content of their communications can do so by replacing words that might trigger attention. For example, instead of writing "The bomb is in position", a terrorist may chose to write "The flower is in position." The substituted sentence would sound a bit "odd" for a human reader and it has been shown in prior research that such oddity is detectable by text mining approaches. However, the importance of each component in the suggested oddity detection approach has not been thoroughly investigated. Also, the approach has not been compared with such an obvious candidate for the task as Hidden Markov Models (HMM). In this work, we explore further oddity detection algorithms reported earlier, specifically those based on pointwise mutual information (PMI) and Hidden Markov Models (HMM).