An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Extracting the names of genes and gene products with a hidden Markov model
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Probabilistic reasoning for entity & relation recognition
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using predicate-argument structures for information extraction
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Exploiting context for biomedical entity recognition: from syntax to the web
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Automatically generating extraction patterns from untagged text
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
HIDE: heterogeneous information DE-identification
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
A shared task involving multi-label classification of clinical free text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
An integrated framework for de-identifying unstructured medical data
Data & Knowledge Engineering
An evaluation of feature sets and sampling techniques for de-identification of medical records
Proceedings of the 1st ACM International Health Informatics Symposium
Automatic extraction of semantic content from medical discharge records
ICOSSE'06 Proceedings of the 5th WSEAS international conference on System science and simulation in engineering
Hi-index | 0.00 |
Deidentification of clinical records is a crucial step before these records can be distributed to non-hospital researchers. Most approaches to deidentification rely heavily on dictionaries and heuristic rules; these approaches fail to remove most personal health information (PHI) that cannot be found in dictionaries. They also can fail to remove PHI that is ambiguous between PHI and non-PHI.Named entity recognition (NER) technologies can be used for deidentification. Some of these technologies exploit both local and global context of a word to identify its entity type. When documents are grammatically written, global context can improve NER.In this paper, we show that we can deidentify medical discharge summaries using support vector machines that rely on a statistical representation of local context. We compare our approach with three different systems. Comparison with a rule-based approach shows that a statistical representation of local context contributes more to deidentification than dictionaries and hand-tailored heuristics. Comparison with two well-known systems, SNoW and IdentiFinder, shows that when the language of documents is fragmented, local context contributes more to deidentification than global context.