Disinformation techniques for entity resolution

Authors:
Steven Euijong Whang;Hector Garcia-Molina
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 8
Cited 0

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
Vision paper: enabling privacy for the paranoids

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Introduction to Information Retrieval

Introduction to Information Retrieval
Data Leakage Detection

IEEE Transactions on Knowledge and Data Engineering
Leakage in data mining: formulation, detection, and avoidance

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection

Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of disinformation. We assume that an ``agent'' has some sensitive information that the ``adversary'' is trying to obtain. For example, a camera company (the agent) may secretly be developing its new camera model, and a user (the adversary) may want to know in advance the detailed specs of the model. The agent's goal is to disseminate false information to ``dilute'' what is known by the adversary. We model the adversary as an Entity Resolution (ER) process that pieces together available information. We formalize the problem of finding the disinformation with the highest benefit given a limited budget for creating the disinformation and propose efficient algorithms for solving the problem. We then evaluate our disinformation planning algorithms on real and synthetic data and compare the robustness of existing ER algorithms. In general, our disinformation techniques can be used as a framework for testing ER robustness.