Fingerprinting Text in Logical Markup Languages

  • Authors:
  • Christian D. Jensen

  • Affiliations:
  • -

  • Venue:
  • ISC '01 Proceedings of the 4th International Conference on Information Security
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information hiding is attracting an increasing attention from the research community. Most of this research has centered around hiding information, such as watermarks and fingerprints, in images or digital audio and video signals. Text has generally been treated as a black & white image with special properties. All of the current methods of hiding information in text are vulnerable to scanning followed by optical character recognition in order to reconstruct the text.Document distribution is increasingly relying on logical markup languages like HTML and XML, where the physical presentation of the text is determined by the user's browser. Embedding the watermark in the physical presentation of the document is therefore no longer practical. We argue that embedding syntactic or semantic fingerprints in text is the only viable way to fingerprint document in logical markup languages such as HTML or XML.In this paper, we propose a new semantic fingerprinting mechanism based on synonymsubstitution. This idea is developed into an operational system and results of preliminary experiments are reported.