IEEE Transactions on Pattern Analysis and Machine Intelligence
Acrophile: an automated acronym extractor and server
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
SaRAD: a Simple and Robust Abbreviation Dictionary
Bioinformatics
Resolving abbreviations to their senses in Medline
Bioinformatics
Medstract: creating large-scale information servers for biomedical libraries
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
ADAM: another database of abbreviations in MEDLINE
Bioinformatics
Journal of Biomedical Informatics
Disease mention recognition with specific features
BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Bioinformatics
Hi-index | 0.00 |
We present an algorithm for extracting abbreviation definitions from biomedical text. Our approach is based on an alignment HMM, matching abbreviations and their definitions. We report 98% precision and 93% recall on a standard data set, and 95% precision and 91% recall on an additional test set. Our results show an improvement over previously reported methods and our model has several advantages. Our model: (1) is simpler and faster than a comparable alignment-based abbreviation extractor; (2) is naturally generalizable to specific types of abbreviations, e.g., abbreviations of chemical formulas; (3) is trained on a set of unlabeled examples; and (4) associates a probability with each predicted definition. Using the abbreviation alignment model we were able to extract over 1.4 million abbreviations from a corpus of 200K full-text PubMed papers, including 455,844 unique definitions.