Security-control methods for statistical databases: a comparative study
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Practical Data-Oriented Microaggregation for Statistical Disclosure Control
IEEE Transactions on Knowledge and Data Engineering
Disclosure Risk Assessment in Perturbative Microdata Protection
Inference Control in Statistical Databases, From Theory to Practice
Information and Communication: Alternative Uses of the Internet in Households
Information Systems Research
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Disclosure risk assessment in statistical microdata protection via advanced record linkage
Statistics and Computing
On Privacy-Preserving Access to Distributed Heterogeneous Healthcare Information
HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 6 - Volume 6
On the Depth and Dynamics of Online Search Behavior
Management Science
Modeling Browsing Behavior at Multiple Websites
Marketing Science
Assessing global disclosure risk in masked microdata
Proceedings of the 2004 ACM workshop on Privacy in the electronic society
Composition and Disclosure of Unlinkable Distributed Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Revisiting the uniqueness of simple demographics in the US population
Proceedings of the 5th ACM workshop on Privacy in electronic society
Secure and useful data sharing
Decision Support Systems
Dare to share: Protecting sensitive knowledge with data sanitization
Decision Support Systems
A review for mobile commerce research and applications
Decision Support Systems
Access control and audit model for the multidimensional modeling of data warehouses
Decision Support Systems
A computational model to protect patient data from location-based re-identification
Artificial Intelligence in Medicine
Robust De-anonymization of Large Sparse Datasets
SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Assessing Disclosure Risk for Record Linkage
PSD '08 Proceedings of the UNESCO Chair in data privacy international conference on Privacy in Statistical Databases
Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining
Decision Support Systems
Secure construction of k-unlinkable patient records from distributed providers
Artificial Intelligence in Medicine
Messin' with texas deriving mother's maiden names using public records
ACNS'05 Proceedings of the Third international conference on Applied Cryptography and Network Security
The effects of location access behavior on re-identification risk in a distributed environment
PET'06 Proceedings of the 6th international conference on Privacy Enhancing Technologies
Using mahalanobis distance-based record linkage for disclosure risk assessment
PSD'06 Proceedings of the 2006 CENEX-SDC project international conference on Privacy in Statistical Databases
Effective immunization of online networks: a self-similar selection approach
Information Technology and Management
Hi-index | 0.00 |
We live in an increasingly mobile world, which leads to the duplication of information across domains. Though organizations attempt to obscure the identities of their constituents when sharing information for worthwhile purposes, such as basic research, the uncoordinated nature of such environment can lead to privacy vulnerabilities. For instance, disparate healthcare providers can collect information on the same patient. Federal policy requires that such providers share ''de-identified'' sensitive data, such as biomedical (e.g., clinical and genomic) records. But at the same time, such providers can share identified information, devoid of sensitive biomedical data, for administrative functions. On a provider-by-provider basis, the biomedical and identified records appear unrelated, however, links can be established when multiple providers' databases are studied jointly. The problem, known as trail disclosure, is a generalized phenomenon and occurs because an individual's location access pattern can be matched across the shared databases. Due to technical and legal constraints, it is often difficult to coordinate between providers and thus it is critical to assess the disclosure risk in distributed environments, so that we can develop techniques to mitigate such risks. Research on privacy protection has so far focused on developing technologies to suppress or encrypt identifiers associated with sensitive information. There is a growing body of work on the formal assessment of the disclosure risk of database entries in publicly shared databases, but less attention has been paid to the distributed setting. In this research, we review the trail disclosure problem in several domains with known vulnerabilities and show that disclosure risk is influenced by the distribution of how people visit service providers. Based on empirical evidence, we propose an entropy metric for assessing such risk in shared databases prior to their release. This metric assesses risk by leveraging the statistical characteristics of a visit distribution, as opposed to person-level data. It is computationally efficient and superior to existing risk assessment methods, which rely on ad hoc assessment that are often computationally expensive and unreliable. We evaluate our approach on a range of location access patterns in simulated environments. Our results demonstrate that the approach is effective at estimating trail disclosure risks and the amount of self-information contained in a distributed system is one of the main driving factors.