Randomization in privacy preserving data mining
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
A widely used method for confidentiality protection instatistical databases is to add zero mean noise tosensitive attribute values. Most studies assume that theattributes are normally distributed Using anexponential random variable as an example, thisarticle investigates the effect of additive noise datamasking for attributes with skewed distributions.Examples of exponentially distributed sensitiveattributes used for statistical analysis include the timebetween testing HIV positive and the manifestation ofsymptoms for AIDS and the time between consecutivearrests for repeat offenders. We analyze the issues ofdata quality and confidentiality protection. Our resultsindicate that skewed attributes are, in some sense,better protected than normally distributed attributesunder additive noise data masking.