Content analysis: What are they talking about?
Computers & Education - Methodological issue in researching CSCL
Efficient Extraction of Protein-Protein Interactions from Full-Text Articles
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Using text to build semantic networks for pharmacogenomics
Journal of Biomedical Informatics
The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions
Journal of Biomedical Informatics
Hi-index | 0.00 |
Publications that report genotype-drug interaction findings, as well as manually curated databases such as DrugBank and PharmGKB are essential to advancing pharmacogenomics, a relatively new area merging pharmacology and genomic research. Natural language processing (NLP) methods can be very useful for automatically extracting knowledge such as gene-drug interactions, offering researchers immediate access to published findings, and allowing curators a shortcut for their work. We present a corpus of gene-drug interactions for evaluating and training systems to extract those interactions. The corpus includes 551 sentences that have a mention of a drug and a gene from about 600 journals found to be relevant to pharmacogenomics through an analysis of gene-drug relationships in the PharmGKB knowledgebase. We evaluated basic approaches to automatic extraction, including gene and drug co-occurrence, co-occurrence plus interaction terms, and a linguistic pattern-based method. The linguistic pattern method had the highest precision (96.61%) but lowest recall (7.30%), for an f-score of 13.57%. Basic co-occurrence yields 68.99% precision, with the addition of an interaction term precision increases slightly (69.60%), though not as much as could be expected. Co-occurrence is a reasonable baseline method, with pattern-based being a promising approach if enough patterns can be generated to address recall. The corpus is available at http://diego.asu.edu/index.php/projects