Biomarker information extraction tool (BIET) development using natural language processing and machine learning

  • Authors:
  • Md Tawhidul Islam;M. Shaikh;A. Nayak;S. Ranganathan

  • Affiliations:
  • Macquarie University, NSW, Australia;University of Tokyo, Bunkyo Ku, Tokyo, Japan;Macquarie University, NSW, Australia;Macquarie University, NSW, Australia

  • Venue:
  • Proceedings of the International Conference and Workshop on Emerging Trends in Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, there has been a rising interest in extracting entities and relations from biomedical literatures. A vast number of systems and approaches have been proposed to extract biological relations but none of them achieves satisfactory results due to the failure of handling the grammatical complexities and subtle features of biomedical texts. In this paper, we detail an approach to a very specific task of information extraction namely, extracting biomarker information in biomedical literature. Starting with the abstract of a given publication, we first identify the evaluative sentence(s) among other sentences by recognizing words and phrases in the text belonging to semantic categories of interest to bio-medical entities (semantic category recognition). For the entities like, protein, gene and disease, we determine whether the statement refers to biomarker relationship (assertion classification). Finally, we identify the biomarker relationship among the bio-medical entities (semantic relationship classification). Our approach utilizes a series of statistical models that rely heavily on local lexical and syntactic context and achieve competitive results compared to more complex NLP solutions. We conclude the paper by presenting the design of a system namely, the Biomarker Information Extraction Tool (BIET). BIET combines our solutions to semantic category recognition, assertion classification and semantic relationship classification into a single application that facilitates the easy extraction of semantic information from medical text. We designed and implemented ML-based BIET system for biomarker extraction, using support vector machines and trained and tested it on a corpus of oncology related PubMed/MEDLINE literatures hand-annotated with biomarker information. Several tests are performed to assess the performance of the system's component namely semantic category recognizer, assertion classifier and semantic relationship classifier and the system achieves an average F-score of 86% for the task of biomarker information extraction comparing to the human annotated dataset (i.e. gold standard) scores.