The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions

Authors:
María Herrero-Zazo;Isabel Segura-Bedmar;Paloma Martínez;Thierry Declerck
Affiliations:
-;-;-;-
Venue:
Journal of Biomedical Informatics
Year:
2013

Citing 10
Cited 0

Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Building a semantically annotated corpus of clinical texts

Journal of Biomedical Informatics
Towards role-based filtering of disease outbreak reports

Journal of Biomedical Informatics
Summary of Product Characteristics content extraction for a safe drugs usage

Journal of Biomedical Informatics
Using a shallow linguistic kernel for drug-drug interaction extraction

Journal of Biomedical Informatics
The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships

Journal of Biomedical Informatics
Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports

Journal of Biomedical Informatics
What can NLP tell us about BioNLP?

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Using natural language processing to identify pharmacokinetic drug-drug interactions described in drug package inserts

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Automatic approaches for gene-drug interaction extraction from biomedical text: corpus and comparative evaluation

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The management of drug-drug interactions (DDIs) is a critical issue resulting from the overwhelming amount of information available on them. Natural Language Processing (NLP) techniques can provide an interesting way to reduce the time spent by healthcare professionals on reviewing biomedical literature. However, NLP techniques rely mostly on the availability of the annotated corpora. While there are several annotated corpora with biological entities and their relationships, there is a lack of corpora annotated with pharmacological substances and DDIs. Moreover, other works in this field have focused in pharmacokinetic (PK) DDIs only, but not in pharmacodynamic (PD) DDIs. To address this problem, we have created a manually annotated corpus consisting of 792 texts selected from the DrugBank database and other 233 Medline abstracts. This fined-grained corpus has been annotated with a total of 18,502 pharmacological substances and 5028 DDIs, including both PK as well as PD interactions. The quality and consistency of the annotation process has been ensured through the creation of annotation guidelines and has been evaluated by the measurement of the inter-annotator agreement between two annotators. The agreement was almost perfect (Kappa up to 0.96 and generally over 0.80), except for the DDIs in the MedLine database (0.55-0.72). The DDI corpus has been used in the SemEval 2013 DDIExtraction challenge as a gold standard for the evaluation of information extraction techniques applied to the recognition of pharmacological substances and the detection of DDIs from biomedical texts. DDIExtraction 2013 has attracted wide attention with a total of 14 teams from 7 different countries. For the task of recognition and classification of pharmacological names, the best system achieved an F1 of 71.5%, while, for the detection and classification of DDIs, the best result was F1 of 65.1%. These results show that the corpus has enough quality to be used for training and testing NLP techniques applied to the field of Pharmacovigilance. The DDI corpus and the annotation guidelines are free for use for academic research and are available at http://labda.inf.uc3m.es/ddicorpus.