Logistic regression with variables subject to post randomization method

  • Authors:
  • Yong Ming Jeffrey Woo;Aleksandra B. Slavković

  • Affiliations:
  • Department of Statistics, The Pennsylvania State University, University Park, PA;Department of Statistics, The Pennsylvania State University, University Park, PA

  • Venue:
  • PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Post Randomization Method (PRAM) is a disclosure avoidance method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. In this paper, we develop and implement a number of EM algorithms to obtain unbiased estimates of the logistic regression model with data subject to PRAM, and thus effectively account for the effects of PRAM and preserve data utility. Three different cases are considered: (1) covariates subject to PRAM, (2) response variable subject to PRAM, and (3) both covariates and response variables subject to PRAM. The proposed techniques improve on current methodology by increasing the applicability of PRAM to a wider range of products and could be extended to other type of generalized linear models. The effects of the level of perturbation and sample size on the estimates are evaluated, and relevant standard error estimates are developed and reported.