A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations

Authors:
Hong Yu;Won Kim;Vasileios Hatzivassiloglou;John Wilbur
Affiliations:
University of Wisconsin-Milwaukee, Milwaukee, WI;National Center for Biotechnology Information, Bethesda, MD;University of Texas, Richardson, TX;National Center for Biotechnology Information, Bethesda, MD
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2006

Citing 13
Cited 9

Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method

Computers and Biomedical Research
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Compression to Identify Acronyms in Text

DCC '00 Proceedings of the Conference on Data Compression
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Semi-supervised Maximum Entropy based approach to acronym and abbreviation normalization in medical texts

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Disambiguation of biomedical abbreviations

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
A discriminative alignment model for abbreviation recognition

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Acronym extraction and disambiguation in large-scale organizational web pages

Proceedings of the 18th ACM conference on Information and knowledge management
Robust approach to abbreviating terms: a discriminative latent variable model with global information

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
An IR-Aided Machine Learning Framework for the BioCreative II.5 Challenge

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Disambiguation in the biomedical domain: The role of ambiguity type

Journal of Biomedical Informatics
Disambiguation of medline abstracts using topic models

Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

ACM Transactions on Asian Language Information Processing (TALIP)
Name disambiguation in scientific cooperation network by exploiting user feedback

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abbreviations and acronyms are widely used in the biomedical literature and many of them represent important biomedical concepts. Because many abbreviations are ambiguous (e.g., CAT denotes both chloramphenicol acetyl transferase and computed axial tomography, depending on the context), recognizing the full form associated with each abbreviation is in most cases equivalent to identifying the meaning of the abbreviation. This, in turn, allows us to perform more accurate natural language processing, information extraction, and retrieval. In this study, we have developed supervised approaches to identifying the full forms of ambiguous abbreviations within the context they appear. We first automatically assigned multiple possible full forms for each abbreviation; we then treated the in-context full-form prediction for each specific abbreviation occurrence as a case of word-sense disambiguation. We generated automatically a dictionary of all possible full forms for each abbreviation. We applied supervised machine-learning algorithms for disambiguation. Because some of the links between abbreviations and their corresponding full forms are explicitly given in the text and can be recovered automatically, we can use these explicit links to automatically provide training data for disambiguating the abbreviations that are not linked to a full form within a text. We evaluated our methods on over 150 thousand abstracts and obtain for coverage and precision results of 82% and 92%, respectively, when performed as tenfold cross-validation, and 79% and 80%, respectively, when evaluated against an external set of abstracts in which the abbreviations are not defined.