Auditing disclosure by relevance ranking

Authors:
Rakesh Agrawal;Alexandre Evfimievski;Jerry Kiernan;Raja Velu
Affiliations:
Microsoft Search Labs, Mountain View, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;Yahoo! Inc., Sunnyvale, CA
Venue:
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Year:
2007

Citing 5
Cited 4

Handbook of combinatorics (vol. 2)

Handbook of combinatorics (vol. 2)
Watermarking relational data: framework, algorithms and analysis

The VLDB Journal — The International Journal on Very Large Data Bases
A formal analysis of information disclosure in data exchange

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fingerprinting Relational Databases: Schemes and Specialties

IEEE Transactions on Dependable and Secure Computing
Auditing compliance with a Hippocratic database

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Security deciding in publishing views based on entropy

Proceedings of the 5th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness
A method of deciding the security in publishing views

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
PolicyReplay: misconfiguration-response queries for data breach reporting

Proceedings of the VLDB Endowment
Pay-as-You-Go ranking of schema mappings using query logs

DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Numerous widely publicized cases of theft and misuse of private information underscore the need for audit technology to identify the sources of unauthorized disclosure. We present an auditing methodology that ranks potential disclosure sources according to their proximity to the leaked records. Given a sensitive table that contains the disclosed data, our methodology prioritizes by relevance the past queries to the database that could have potentially been used to produce the sensitive table. We provide three conceptually different measures of proximity between the sensitive table and a query result. One measure is inspired by information retrieval in text processing, another is based on statistical record linkage, and the third computes the derivation probability of the sensitive table in a tree-based generative model. We also analyze the characteristics of the three measures and the corresponding ranking algorithms.