Words are not enough: sentence level natural language watermarking

Authors:
Mercan Topkara;Umut Topkara;Mikhail J. Atallah
Affiliations:
Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN
Venue:
Proceedings of the 4th ACM international workshop on Contents protection and security
Year:
2006

Citing 9
Cited 7

Natural language processing for information assurance and security: an overview and implementations

Proceedings of the 2000 workshop on New security paradigms
Plausible Deniability Using Automated Linguistic Stegonagraphy

InfraSec '02 Proceedings of the International Conference on Infrastructure Security
Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation

IHW '01 Proceedings of the 4th International Workshop on Information Hiding
Natural Language Watermarking and Tamperproofing

IH '02 Revised Papers from the 5th International Workshop on Information Hiding
Principles of Context-Based Machine Translation Evaluation

Machine Translation
A fast and portable realizer for text generation systems

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions

MM&Sec '06 Proceedings of the 8th workshop on Multimedia and security
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Natural language watermarking via morphosyntactic alterations

Computer Speech and Language
Authenticating Binary Text Documents Using a Localising OMAC Watermark Robust to Printing and Scanning

IWDW '07 Proceedings of the 6th International Workshop on Digital Watermarking
Text watermarking by syntactic analysis

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
Linguistic steganography using automatically generated paraphrases

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Practical linguistic steganography using contextual synonym substitution and vertex colour coding

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Adaptive-capacity and robust natural language watermarking for agglutinative languages

Security and Communication Networks
Natural language watermarking for german texts

Proceedings of the first ACM workshop on Information hiding and multimedia security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compared to other media, natural language text presents unique challenges for information hiding. These challenges require the design of a robust algorithm that can work under following constraints: (i) low embedding bandwidth, i.e., number of sentences is comparable with message length, (ii) not all transformations can be applied to a given sentence (iii) the number of alternative forms for a sentence is relatively small, a limitation governed by the grammar and vocabulary of the natural language, as well as the requirement to preserve the style and fluency of the document. The adversary can carry out all the transformations used for embedding to remove the embedded message. In addition, the adversary can also permute the sentences, select and use a subset of sentences, and insert new sentences. We give a scheme that overcomes these challenges, together with a partial implementation and its evaluation for the English language. The present application of this scheme works at the sentence level while also using a word-level watermarking technique that was recently designed and built into a fully automatic system ("Equimark"). Unlike Equimark, whose resilience relied on the introduction of ambiguities, the present paper's sentence-level technique is more tuned to situations where very little change to the text is allowable (i.e., when style is important). Secondarily, this paper shows how to use lower-level (in this case word-level) marking to improve the resilience and embedding properties of higher level (in this case sentence level) schemes. We achieve this by using the word-based methods as a separate channel from the sentence-based methods, thereby improving the results of either one alone. The sentence level watermarking technique we introduce is novel and powerful, as it relies on multiple features of each sentence and exploits the notion of orthogonality between features.