A Data Disclosure Policy for Count Data Based on the COM-Poisson Distribution

Authors:
Joseph B. Kadane;Ramayya Krishnan;Galit Shmueli
Affiliations:
Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213;The Heinz School of Public Policy and Management, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213;Department of Decision and Information Technologies, Smith School of Business, University of Maryland, College Park, Maryland 20742
Venue:
Management Science
Year:
2006

Citing 10
Cited 5

Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators

Management Science
Information-Theoretic Disclosure Risk Measures in Statistical Disclosure Control of Tabular Data

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
The Security of Confidential Numerical Data in Databases

Information Systems Research
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Achieving k-anonymity privacy protection using generalization and suppression

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Information preserving statistical obfuscation

Statistics and Computing
On Privacy-Preserving Access to Distributed Heterogeneous Healthcare Information

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 6 - Volume 6
Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat

Management Science
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation

Data Mining and Knowledge Discovery
Preventing interval-based inference by random data perturbation

PET'02 Proceedings of the 2nd international conference on Privacy enhancing technologies

Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records

Management Science
Privacy-preserving similarity-based text retrieval

ACM Transactions on Internet Technology (TOIT)
Rejoinder: The COM-Poisson Model for count data: A survey of methods and applications

Applied Stochastic Models in Business and Industry
The COM-Poisson model for count data: a survey of methods and applications

Applied Stochastic Models in Business and Industry
Class-Restricted Clustering and Microperturbation for Data Privacy

Management Science

Quantified Score

Hi-index	0.01

Visualization

Abstract

Count data arise in various organizational settings. When the release of such data is sensitive, organizations need information-disclosure policies that protect data confidentiality while still providing data access. In contrast to extant disclosure policies, we describe a new policy for count tables that is based on disclosing only the sufficient statistics of a flexible discrete distribution. This distribution, the COM-Poisson, well approximates Poisson counts but also under- and over-dispersed counts. The sufficient statistics mask the exact cell counts and often also the table size. Under the scenario of a data holding agency and a data snooper, we show that this policy has low disclosure risk with no loss of data utility: Usually, many count tables correspond to the disclosed sufficient statistics. Furthermore, these count tables are equally likely to be the undisclosed table. Finding these solutions requires solving a system of linear equations, which are underdetermined for tables with more than three cells, and can be computationally prohibitive for even small tables. We also consider cell-specific interval bounds, a commonly used disclosure limitation policy, and compare them to our policy. We describe several types of snooper knowledge, their integration with the disclosed statistics, and implications. Applying this policy to three real data sets, we illustrate the low associated disclosure risk.