A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text

Authors:
Rong Xu;Quanqiu Wang
Affiliations:
Medical Informatics Division, Case Western Reserve University, OH, USA;ThinTek LLC, Palo Alto, CA, USA
Venue:
Journal of Biomedical Informatics
Year:
2012

Citing 8
Cited 2

Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A shallow parser based on closed-class words to capture relations in biomedical text

Journal of Biomedical Informatics
Extracting molecular binding relationships from biomedical text

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text

Journal of Biomedical Informatics - Special issue: Unified medical language system
Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
RelEx---Relation extraction using dependency parse trees

Bioinformatics
Manual curation is not sufficient for annotation of genomic databases

Bioinformatics
Using text to build semantic networks for pharmacogenomics

Journal of Biomedical Informatics

Guest Editorial: The state of the art in text mining and natural language processing for pharmacogenomics

Journal of Biomedical Informatics
A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important task in pharmacogenomics (PGx) studies is to identify genetic variants that may impact drug response. The success of many systematic and integrative computational approaches for PGx studies depends on the availability of accurate, comprehensive and machine understandable drug-gene relationship knowledge bases. Scientific literature is one of the most comprehensive knowledge sources for PGx-specific drug-gene relationships. However, the major barrier in accessing this information is that the knowledge is buried in a large amount of free text with limited machine understandability. Therefore there is a need to develop automatic approaches to extract structured PGx-specific drug-gene relationships from unstructured free text literature. In this study, we have developed a conditional relationship extraction approach to extract PGx-specific drug-gene pairs from 20million MEDLINE abstracts using known drug-gene pairs as prior knowledge. We have demonstrated that the conditional drug-gene relationship extraction approach significantly improves the precision and F1 measure compared to the unconditioned approach (precision: 0.345 vs. 0.11; recall: 0.481 vs. 1.00; F1: 0.402 vs. 0.201). In this study, a method based on co-occurrence is used as the underlying relationship extraction method for its simplicity. It can be replaced by or combined with more advanced methods such as machine learning or natural language processing approaches to further improve the performance of the drug-gene relationship extraction from free text. Our method is not limited to extracting a drug-gene relationship; it can be generalized to extract other types of relationships when related background knowledge bases exist.