Information extraction from biomedical literature: methodology, evaluation and an application

Authors:
L. Venkata Subramaniam;Sougata Mukherjea;Pankaj Kankar;Biplav Srivastava;Vishal S. Batra;Pasumarti V. Kamesam;Ravi Kothari
Affiliations:
IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India
Venue:
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Year:
2003

Citing 2
Cited 11

Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Identification of probable real words: an entropy-based approach

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9

Text analytics for life science using the unstructured information management architecture

IBM Systems Journal
Text-based summarization and visualization of gene clusters

Proceedings of the 2005 ACM symposium on Applied computing
Information Retrieval and Knowledge Discovery Utilizing a BioMedical Patent Semantic Web

IEEE Transactions on Knowledge and Data Engineering
Enhancing a biomedical information extraction system with dictionary mining and context disambiguation

IBM Journal of Research and Development
BioPatentMiner: an information retrieval system for biomedical patents

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Discovering semantic biomedical relations utilizing the Web

ACM Transactions on Knowledge Discovery from Data (TKDD)
Enhancing keyword-based botanical information retrieval with information extraction

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic metadata extraction from museum specimen labels

DCMI '08 Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications
MaxMatcher: biological concept extraction using approximate dictionary lookup

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Using concept-based indexing to improve language modeling approach to genomic IR

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Enhancing biomedical concept extraction using semantic relationship weights

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Journals and conference proceedings represent the dominant mechanisms of reporting new biomedical results. The unstructured nature of such publications makes it difficult to utilize data mining or automated knowledge discovery techniques. Annotation (or markup) of these unstructured documents represents the first step in making these documents machine analyzable. In this paper we first present a system called BioAnnotator for identifying and annotating biological terms in documents. BioAnnotator uses domain based dictionary look-up for recognizing known terms and a rule engine for discovering new terms. The combination and dictionary look-up and rules result in good performance (87% precision and 94% recall on the GENIA 1.1 corpus for extracting general biological terms based on an approximate matching criterion). To demonstrate the subsequent mining and knowledge discovery activities that are made feasible by BioAnnotator, we also present a system called MedSummarizer that uses the extracted terms to identify the common concepts in a given group of genes.