Detection of substitution-based linguistic steganography by relative frequency analysis

Authors:
Zhili Chen;Liusheng Huang;Wei Yang
Affiliations:
NHPCC, School of CS. & Tech., USTC, Hefei 230027, China and Suzhou Institute for Advanced Study, USTC, Suzhou 215123, China;NHPCC, School of CS. & Tech., USTC, Hefei 230027, China and Suzhou Institute for Advanced Study, USTC, Suzhou 215123, China;NHPCC, School of CS. & Tech., USTC, Hefei 230027, China and Suzhou Institute for Advanced Study, USTC, Suzhou 215123, China
Venue:
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Year:
2011

Citing 6
Cited 0

Natural language processing for information assurance and security: an overview and implementations

Proceedings of the 2000 workshop on New security paradigms
A Practical and Effective Approach to Large-Scale Automated Linguistic Steganography

ISC '01 Proceedings of the 4th International Conference on Information Security
Lost in just the translation

Proceedings of the 2006 ACM symposium on Applied computing
The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions

MM&Sec '06 Proceedings of the 8th workshop on Multimedia and security
Detection of Synonym-Substitution Modified Articles Using Context Information

FGCN '08 Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking - Volume 01
A method of linguistic steganography based on collocationally-verified synonymy

IH'04 Proceedings of the 6th international conference on Information Hiding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linguistic steganography hides information in natural language texts. Because of the increasing in importance and quantity of natural language texts, linguistic steganography plays a more and more important role in Information Security (IS) area today. Substitution-based linguistic steganography is one of the most commonly used linguistic steganography methods, which is of considerable security and favorable simpleness. In this paper, we propose a straightforward method based on Relative Frequency Analysis (RFA), which makes use of the frequency characteristics of the testing texts (the texts being tested), to detect substitution-based linguistic steganography. We formally prove several properties about relative frequency which can be used in the detection process and propose a detection scheme. And then as an example, an existent synonym-substitution system T-Lex is examined and the detection experiment is carried out. In the experiment with pure literature texts, the accuracy, precision and recall of the detection are found to be as high as 98.64%, 97.77% and 99.55%, respectively, when the substitution count is 90, while in the experiment with balanced texts, the highest detection accuracy is 95%, which indicates that the detection scheme is promising.