Biomarker information extraction tool (BIET) development using natural language processing and machine learning

Authors:
Md Tawhidul Islam;M. Shaikh;A. Nayak;S. Ranganathan
Affiliations:
Macquarie University, NSW, Australia;University of Tokyo, Bunkyo Ku, Tokyo, Japan;Macquarie University, NSW, Australia;Macquarie University, NSW, Australia
Venue:
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
Year:
2010

Citing 7
Cited 0

Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data

Journal of Biomedical Informatics
GATE: an architecture for development of robust HLT applications

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Using GATE as an environment for teaching NLP

ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Kernel methods for relation extraction

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Multi-way relation classification: application to protein-protein interactions

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, there has been a rising interest in extracting entities and relations from biomedical literatures. A vast number of systems and approaches have been proposed to extract biological relations but none of them achieves satisfactory results due to the failure of handling the grammatical complexities and subtle features of biomedical texts. In this paper, we detail an approach to a very specific task of information extraction namely, extracting biomarker information in biomedical literature. Starting with the abstract of a given publication, we first identify the evaluative sentence(s) among other sentences by recognizing words and phrases in the text belonging to semantic categories of interest to bio-medical entities (semantic category recognition). For the entities like, protein, gene and disease, we determine whether the statement refers to biomarker relationship (assertion classification). Finally, we identify the biomarker relationship among the bio-medical entities (semantic relationship classification). Our approach utilizes a series of statistical models that rely heavily on local lexical and syntactic context and achieve competitive results compared to more complex NLP solutions. We conclude the paper by presenting the design of a system namely, the Biomarker Information Extraction Tool (BIET). BIET combines our solutions to semantic category recognition, assertion classification and semantic relationship classification into a single application that facilitates the easy extraction of semantic information from medical text. We designed and implemented ML-based BIET system for biomarker extraction, using support vector machines and trained and tested it on a corpus of oncology related PubMed/MEDLINE literatures hand-annotated with biomarker information. Several tests are performed to assess the performance of the system's component namely semantic category recognizer, assertion classifier and semantic relationship classifier and the system achieves an average F-score of 86% for the task of biomarker information extraction comparing to the human annotated dataset (i.e. gold standard) scores.