Enhancing a biomedical information extraction system with dictionary mining and context disambiguation

Authors:
S. Mukherjea;L. V. Subramaniam;G. Chanda;S. Sankararaman;R. Kothari;V. Batra;D. Bhardwaj;B. Srivastava
Affiliations:
IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016
Venue:
IBM Journal of Research and Development
Year:
2004

Citing 6
Cited 4

Trie memory

Communications of the ACM
Information extraction from biomedical literature: methodology, evaluation and an application

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language
Identification of probable real words: an entropy-based approach

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9

@Note: A workbench for Biomedical Text Mining

Journal of Biomedical Informatics
Investigation of unsupervised pattern learning techniques for bootstrap construction of a medical treatment lexicon

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Various features with integrated strategies for protein name classification

ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
Semantic annotation of biomedical literature using google

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

Journals and conference proceedings represent the dominant mechanisms for reporting new biomedical results. The unstructured nature of such publications makes it difficult to utilize data mining or automated knowledge discovery techniques. Annotation (or markup) of these unstructured documents represents the first step in making these documents machine-analyzable. Often, however, the use of similar (or the same) labels for different entities and the use of different labels for the same entity makes entity extraction difficult in biomedical literature, In this paper we present a system called BioAnnotator for identifying and classifying biological terms in documents. BioAnnotator uses domain-based dictionary lookup for recognizing known terms and a rule engine for discovering new terms. We explain how the system uses a biomedical dictionary to learn extraction patterns for the rule engine and how it disambiguates biological terms that belong to multiple semantic classes.