Identification, expansion, and disambiguation of acronyms in biomedical texts

Authors:
David B. Bracewell;Scott Russell;Annie S. Wu
Affiliations:
School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FLA;School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FLA;School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FLA
Venue:
ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
Year:
2005

Citing 4
Cited 0

WordNet: a lexical database for English

Communications of the ACM
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised Maximum Entropy based approach to acronym and abbreviation normalization in medical texts

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the ever growing amount of biomedical literature there is an increasing desire to use sophisticated language processing algorithms to mine these texts. In order to use these algorithms we must first deal with acronyms, abbreviations, and misspellings.In this paper we look at identifying, expanding, and disambiguating acronyms in biomedical texts. We break the task up into three modular steps: Identification, Expansion, and Disambiguation. For Identification we use a hybrid approach that is composed of a naive Bayesian classifier and a couple of handcrafted rules. We are able to achieve results of 99.96% accuracy with a small training set. We break the expansion up into two categories, local and global expansion. For local expansion we use windowing and longest common subsequence to generate the possible expansions. Global expansion requires an acronym database. To disambiguate the different candidate expansions we use WordNet and semantic similarity. Overall we obtain a recall and precision of over 91%.