Limits of Anonymity in Open Environments
IH '02 Revised Papers from the 5th International Workshop on Information Hiding
Information and Communication: Alternative Uses of the Internet in Households
Information Systems Research
Usable privacy and security for personal information management
Communications of the ACM - Personal information management
Composition and Disclosure of Unlinkable Distributed Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Statistical disclosure or intersection attacks on anonymity systems
IH'04 Proceedings of the 6th international conference on Information Hiding
Messin' with texas deriving mother's maiden names using public records
ACNS'05 Proceedings of the Third international conference on Applied Cryptography and Network Security
A de-identifier for medical discharge summaries
Artificial Intelligence in Medicine
Privacy-preserving data publishing: A survey of recent developments
ACM Computing Surveys (CSUR)
Analyzing characteristic host access patterns for re-identification of web user sessions
NordSec'10 Proceedings of the 15th Nordic conference on Information Security Technology for Applications
Hi-index | 0.00 |
In this paper, we investigate how location access patterns influence the re-identification of seemingly anonymous data. In the real world, individuals visit different locations that gather similar information. For instance, multiple hospitals collect health information on the same patient. To protect anonymity for research purposes, hospitals share sensitive data, such as DNA sequences, stripped of explicit identifiers. Separately, for administrative functions, identified data, stripped of DNA, is made available. On a hospital by hospital basis, each pair of DNA and identified databases appears unlinkable, however, links can be established when multiple locations' database are studied. This problem, known as trail re-identification, is a generalized phenomenon and occurs because an individual's location access pattern can be matched across the shared databases. Data holders can not exchange data to find and suppress trails that would be re-identified. Thus, it is important to assess the re-identification risk in a system in order to develop techniques to mitigate it. In this research, we evaluate several real world datasets and observe trail re-identification is related to the number of people to places. To study this phenomenon in more detail, we develop a generative model for location access patterns that simulates observed behavior. We evaluate trail re-identification risk in a range of simulated patterns and our findings suggest that the skew of the distribution of people to places is one of the main factors that drives trail re-identification.