Information extraction from biomedical literature: methodology, evaluation and an application

  • Authors:
  • L. Venkata Subramaniam;Sougata Mukherjea;Pankaj Kankar;Biplav Srivastava;Vishal S. Batra;Pasumarti V. Kamesam;Ravi Kothari

  • Affiliations:
  • IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India

  • Venue:
  • CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Journals and conference proceedings represent the dominant mechanisms of reporting new biomedical results. The unstructured nature of such publications makes it difficult to utilize data mining or automated knowledge discovery techniques. Annotation (or markup) of these unstructured documents represents the first step in making these documents machine analyzable. In this paper we first present a system called BioAnnotator for identifying and annotating biological terms in documents. BioAnnotator uses domain based dictionary look-up for recognizing known terms and a rule engine for discovering new terms. The combination and dictionary look-up and rules result in good performance (87% precision and 94% recall on the GENIA 1.1 corpus for extracting general biological terms based on an approximate matching criterion). To demonstrate the subsequent mining and knowledge discovery activities that are made feasible by BioAnnotator, we also present a system called MedSummarizer that uses the extracted terms to identify the common concepts in a given group of genes.