Detection of substitution-based linguistic steganography by relative frequency analysis

  • Authors:
  • Zhili Chen;Liusheng Huang;Wei Yang

  • Affiliations:
  • NHPCC, School of CS. & Tech., USTC, Hefei 230027, China and Suzhou Institute for Advanced Study, USTC, Suzhou 215123, China;NHPCC, School of CS. & Tech., USTC, Hefei 230027, China and Suzhou Institute for Advanced Study, USTC, Suzhou 215123, China;NHPCC, School of CS. & Tech., USTC, Hefei 230027, China and Suzhou Institute for Advanced Study, USTC, Suzhou 215123, China

  • Venue:
  • Digital Investigation: The International Journal of Digital Forensics & Incident Response
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Linguistic steganography hides information in natural language texts. Because of the increasing in importance and quantity of natural language texts, linguistic steganography plays a more and more important role in Information Security (IS) area today. Substitution-based linguistic steganography is one of the most commonly used linguistic steganography methods, which is of considerable security and favorable simpleness. In this paper, we propose a straightforward method based on Relative Frequency Analysis (RFA), which makes use of the frequency characteristics of the testing texts (the texts being tested), to detect substitution-based linguistic steganography. We formally prove several properties about relative frequency which can be used in the detection process and propose a detection scheme. And then as an example, an existent synonym-substitution system T-Lex is examined and the detection experiment is carried out. In the experiment with pure literature texts, the accuracy, precision and recall of the detection are found to be as high as 98.64%, 97.77% and 99.55%, respectively, when the substitution count is 90, while in the experiment with balanced texts, the highest detection accuracy is 95%, which indicates that the detection scheme is promising.