Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A shallow parser based on closed-class words to capture relations in biomedical text
Journal of Biomedical Informatics
Extracting molecular binding relationships from biomedical text
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Journal of Biomedical Informatics - Special issue: Unified medical language system
Classifying semantic relations in bioscience texts
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
RelEx---Relation extraction using dependency parse trees
Bioinformatics
Using text to build semantic networks for pharmacogenomics
Journal of Biomedical Informatics
Journal of Biomedical Informatics
Journal of Biomedical Informatics
Hi-index | 0.00 |
An important task in pharmacogenomics (PGx) studies is to identify genetic variants that may impact drug response. The success of many systematic and integrative computational approaches for PGx studies depends on the availability of accurate, comprehensive and machine understandable drug-gene relationship knowledge bases. Scientific literature is one of the most comprehensive knowledge sources for PGx-specific drug-gene relationships. However, the major barrier in accessing this information is that the knowledge is buried in a large amount of free text with limited machine understandability. Therefore there is a need to develop automatic approaches to extract structured PGx-specific drug-gene relationships from unstructured free text literature. In this study, we have developed a conditional relationship extraction approach to extract PGx-specific drug-gene pairs from 20million MEDLINE abstracts using known drug-gene pairs as prior knowledge. We have demonstrated that the conditional drug-gene relationship extraction approach significantly improves the precision and F1 measure compared to the unconditioned approach (precision: 0.345 vs. 0.11; recall: 0.481 vs. 1.00; F1: 0.402 vs. 0.201). In this study, a method based on co-occurrence is used as the underlying relationship extraction method for its simplicity. It can be replaced by or combined with more advanced methods such as machine learning or natural language processing approaches to further improve the performance of the drug-gene relationship extraction from free text. Our method is not limited to extracting a drug-gene relationship; it can be generalized to extract other types of relationships when related background knowledge bases exist.