Using text to build semantic networks for pharmacogenomics

Authors:
Adrien Coulet;Nigam H. Shah;Yael Garten;Mark Musen;Russ B. Altman
Affiliations:
Department of Medicine, 300 Pasteur Drive, Room S101, Mail Code 5110, Stanford University, Stanford, CA 94305, USA and Department of Genetics, Mail Stop-5120, Stanford University, Stanford, CA 943 ...;Department of Medicine, 300 Pasteur Drive, Room S101, Mail Code 5110, Stanford University, Stanford, CA 94305, USA;Stanford Biomedical Informatics, 251 Campus Drive, MSOB, Room X215, Mail Code 5479, Stanford University, Stanford, CA 94305, USA;Department of Medicine, 300 Pasteur Drive, Room S101, Mail Code 5110, Stanford University, Stanford, CA 94305, USA;Department of Medicine, 300 Pasteur Drive, Room S101, Mail Code 5110, Stanford University, Stanford, CA 94305, USA and Department of Genetics, Mail Stop-5120, Stanford University, Stanford, CA 943 ...
Venue:
Journal of Biomedical Informatics
Year:
2010

Citing 14
Cited 9

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Extraction of regulatory gene/protein networks from Medline

Bioinformatics
The Description Logic Handbook

The Description Logic Handbook
Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
You can't beat frequency (unless you use linguistic knowledge): a qualitative evaluation of association measures for collocation and term extraction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
RelEx---Relation extraction using dependency parse trees

Bioinformatics
Text analysis for ontology and terminology engineering

Applied Ontology
Methodological Review: Empirical distributional semantics: Methods and biomedical applications

Journal of Biomedical Informatics
The Stanford typed dependencies representation

CrossParser '08 Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Using ontologies and the web to learn lexical semantics

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised learning of semantic relations between concepts of a molecular biology ontology

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Suggested ontology for pharmacogenomics (SO-Pharm): modular construction and preliminary testing

OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part I

Using statistical text mining to supplement the development of an ontology

Journal of Biomedical Informatics
Semantics-aware open information extraction in the biomedical domain

Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences
Transforming semi-structured life science diagrams into meaningful domain ontologies with DiDOn

Journal of Biomedical Informatics
A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text

Journal of Biomedical Informatics
A mutation-centric approach to identifying pharmacogenomic relations in text

Journal of Biomedical Informatics
Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies

Journal of Biomedical Informatics
Systematic identification of pharmacogenomics information from clinical trials

Journal of Biomedical Informatics
Automatic approaches for gene-drug interaction extraction from biomedical text: corpus and comparative evaluation

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation. Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance. In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available. Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction. We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts. We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70-87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics. This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery.