Microdata Protection through Noise Addition
Inference Control in Statistical Databases, From Theory to Practice
Information Sciences—Informatics and Computer Science: An International Journal
Disclosure risk assessment in statistical microdata protection via advanced record linkage
Statistics and Computing
Selecting potentially relevant records using re-identification methods
New Generation Computing
Hi-index | 0.00 |
More and more empirical researchers from universities or research centres like to use register or survey data collected by statistical agencies or the social security system, since these data can by used for several empirical studies, e.g. the analysis of special groups or quantitative effects of economic or social policies. Most of the data required have to be (factually) anonymised before they are disseminated to preserve confidentiality. In the area of statistics on households and individuals this path has been pursued in Germany for several years. The transmission of de facto anonymised datafiles has proved to be a good form of co-operation between scientists and statisticians.Factual anonymity of the data depends on the costs and benefits of a potential reidentification. The paper assumes that the intruder only accepts low costs. Therefore he uses a cluster analysis module that is available in a standard statistical software package to re-identify persons. After a description of the method different factors influencing the re-identification risk are studied using German employment statistics (register data) and the German Life History Study (survey data). The factors are: sample fraction and number of (irrelevant) variables. The results show, that the number of identifiable persons is remarkable high. Furthermore it can be confirmed with the cluster analysis that the number of re-identifiable records increases with increasing sampling fraction and that irrelevant variables reduce this number.