Foundations of statistical natural language processing
Foundations of statistical natural language processing
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
k-anonymity: a model for protecting privacy
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Transforming data to satisfy privacy constraints
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Cost-Sensitive Learning by Cost-Proportionate Example Weighting
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Bottom-Up Generalization: A Data Mining Solution to Privacy Protection
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Top-Down Specialization for Information and Privacy Preservation
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Data Privacy through Optimal k-Anonymization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Privacy and Ownership Preserving of Outsourced Medical Data
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the complexity of optimal K-anonymity
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy-enhancing k-anonymization of customer data
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incognito: efficient full-domain K-anonymity
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On k-anonymity and the curse of dimensionality
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mondrian Multidimensional K-Anonymity
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
\ell -Diversity: Privacy Beyond \kappa -Anonymity
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Privacy Protection: p-Sensitive k-Anonymity Property
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Achieving anonymity via clustering
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Injecting utility into anonymized datasets
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Anonymizing sequential releases
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy: simple and effective privacy preservation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Role of local context in automatic deidentification of ungrammatical, fragmented text
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Hiding the presence of individuals from shared databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
M-invariance: towards privacy preserving re-publication of dynamic datasets
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Anonymizing bipartite graph data using safe groupings
Proceedings of the VLDB Endowment
Privacy preserving serial data publishing by role composition
Proceedings of the VLDB Endowment
Privacy-preserving data publishing: A survey of recent developments
ACM Computing Surveys (CSUR)
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Anonymizing healthcare data: a case study on the blood transfusion service
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Centralized and Distributed Anonymization for High-Dimensional Healthcare Data
ACM Transactions on Knowledge Discovery from Data (TKDD)
An evaluation of feature sets and sampling techniques for de-identification of medical records
Proceedings of the 1st ACM International Health Informatics Symposium
Relationships and data sanitization: a study in scarlet
Proceedings of the 2010 workshop on New security paradigms
A hybrid stepwise approach for de-identifying person names in clinical documents
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Editorial: COMPENDIUM: A text summarization system for generating abstracts of research papers
Data & Knowledge Engineering
Hi-index | 0.00 |
While there is an increasing need to share medical information for public health research, such data sharing must preserve patient privacy without disclosing any information that can be used to identify a patient. A considerable amount of research in data privacy community has been devoted to formalizing the notion of identifiability and developing techniques for anonymization but are focused exclusively on structured data. On the other hand, efforts on de-identifying medical text documents in medical informatics community rely on simple identifier removal or grouping techniques without taking advantage of the research developments in the data privacy community. This paper attempts to fill the above gaps and presents a framework and prototype system for de-identifying health information including both structured and unstructured data. We empirically study a simple Bayesian classifier, a Bayesian classifier with a sampling based technique, and a conditional random field based classifier for extracting identifying attributes from unstructured data. We deploy a k-anonymization based technique for de-identifying the extracted data to preserve maximum data utility. We present a set of preliminary evaluations showing the effectiveness of our approach.