A hybrid stepwise approach for de-identifying person names in clinical documents

  • Authors:
  • Oscar Ferrández;Brett R. South;Shuying Shen;Stéphane M. Meystre

  • Affiliations:
  • University of Utah, Salt Lake City, Utah and IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah;University of Utah, Salt Lake City, Utah and IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah;University of Utah, Salt Lake City, Utah and IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah;University of Utah, Salt Lake City, Utah and IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah

  • Venue:
  • BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As Electronic Health Records are growing exponentially along with large quantities of unstructured clinical information that could be used for research purposes, protecting patient privacy becomes a challenge that needs to be met. In this paper, we present a novel hybrid system designed to improve the current strategies used for person names de-identification. To overcome this task, our system comprises several components designed to accomplish two separate goals: 1) achieve the highest recall (no patient data can be exposed); and 2) create methods to filter out false positives. As a result, our system reached 92.6% F2-measure when de-identifying person names in Veteran's Health Administration clinical notes, and considerably outperformed other existing "out-of-the-box" de-identification or named entity recognition systems.