UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text

  • Authors:
  • Dina Demner-Fushman;James G. Mork;Sonya E. Shooshan;Alan R. Aronson

  • Affiliations:
  • Lister Hill National Center for Biomedical Communications (LHNCBC), U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;Lister Hill National Center for Biomedical Communications (LHNCBC), U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;Lister Hill National Center for Biomedical Communications (LHNCBC), U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;Lister Hill National Center for Biomedical Communications (LHNCBC), U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications.