HIDE: heterogeneous information DE-identification

Authors:
James Gardner;Li Xiong;Kanwei Li;James J. Lu
Affiliations:
Emory University, Atlanta, GA;Emory University, Atlanta, GA;Emory University, Atlanta, GA;Emory University, Atlanta, GA
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 12
Cited 1

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Top-Down Specialization for Information and Privacy Preservation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Privacy and Ownership Preserving of Outsourced Medical Data

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Mondrian Multidimensional K-Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Role of local context in automatic deidentification of ungrammatical, fragmented text

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Hiding the presence of individuals from shared databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
M-invariance: towards privacy preserving re-publication of dynamic datasets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
HIDE: An Integrated System for Health Information DE-identification

CBMS '08 Proceedings of the 2008 21st IEEE International Symposium on Computer-Based Medical Systems

An evaluation of feature sets and sampling techniques for de-identification of medical records

Proceedings of the 1st ACM International Health Informatics Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

While there is an increasing need to share data that may contain personal information, such data sharing must preserve individual privacy without disclosing any identifiable information. A considerable amount of research in the data privacy community has been devoted to formalizing the notion of identifiability with many techniques for anonymization, but is focused exclusively on structured data. On the other hand, efforts on de-identifying medical text documents in the medical informatics community are highly specialized for specific document types or a subset of identifiers. In addition, they rely on simple identifier removal or grouping techniques and do not take advantage of the research developments in the data privacy community. We developed an integrated system, HIDE, for Heterogeneous Information DE-identification including structured and unstructured data utilizing existing anonymization techniques. We demonstrate a prototype of our system and show the effectiveness of our approach through a set of real data augmented with synthesized data.