Semi-supervised Maximum Entropy based approach to acronym and abbreviation normalization in medical texts

  • Authors:
  • Serguei Pakhomov

  • Affiliations:
  • Mayo Foundation, Rochester, MN

  • Venue:
  • ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text normalization is an important aspect of successful information retrieval from medical documents such as clinical notes, radiology reports and discharge summaries. In the medical domain, a significant part of the general problem of text normalization is abbreviation and acronym disambiguation. Numerous abbreviations are used routinely throughout such texts and knowing their meaning is critical to data retrieval from the document. In this paper I will demonstrate a method of automatically generating training data for Maximum Entropy (ME) modeling of abbreviations and acronyms and will show that using ME modeling is a promising technique for abbreviation and acronym normalization. I report on the results of an experiment involving training a number of ME models used to normalize abbreviations and acronyms on a sample of 10,000 rheumatology notes with ~89% accuracy.