Watermarking the outputs of structured prediction with an application in statistical machine translation

Authors:
Ashish Venugopal;Jakob Uszkoreit;David Talbot;Franz J. Och;Juri Ganitkevitch
Affiliations:
Google, Inc., Amphitheatre Parkway, Mountain View, CA;Google, Inc., Amphitheatre Parkway, Mountain View, CA;Google, Inc., Amphitheatre Parkway, Mountain View, CA;Google, Inc., Amphitheatre Parkway, Mountain View, CA;Johns Hopkins University, Baltimore, MD
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 12
Cited 0

Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A Practical and Effective Approach to Large-Scale Automated Linguistic Steganography

ISC '01 Proceedings of the 4th International Conference on Information Security
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
An attack-localizing watermarking scheme for natural language documents

ASIACCS '06 Proceedings of the 2006 ACM Symposium on Information, computer and communications security
Lost in just the translation

Proceedings of the 2006 ACM symposium on Applied computing
An end-to-end discriminative approach to machine translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Large scale parallel document mining for machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a general method to watermark and probabilistically identify the structured outputs of machine learning algorithms. Our method is robust to local editing operations and provides well defined trade-offs between the ability to identify algorithm outputs and the quality of the watermarked output. Unlike previous work in the field, our approach does not rely on controlling the inputs to the algorithm and provides probabilistic guarantees on the ability to identify collections of results from one's own algorithm. We present an application in statistical machine translation, where machine translated output is watermarked at minimal loss in translation quality and detected with high recall.