Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports

Authors:
Harsha Gurulingappa;Abdul Mateen Rajput;Angus Roberts;Juliane Fluck;Martin Hofmann-Apitius;Luca Toldo
Affiliations:
Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany and Bonn-Aachen International Center for Information Technology (B-IT), Dah ...;Department of Knowledge Management, Merck KGaA, Frankfurterstraβe 250, 64293 Darmstadt, Germany;Department of Computer Science, University of Sheffield, Sheffield S1 4DP, United Kingdom;Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany;Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany and Bonn-Aachen International Center for Information Technology (B-IT), Dah ...;Department of Knowledge Management, Merck KGaA, Frankfurterstraβe 250, 64293 Darmstadt, Germany
Venue:
Journal of Biomedical Informatics
Year:
2012

Citing 5
Cited 2

BioCaster

Bioinformatics
Building a semantically annotated corpus of clinical texts

Journal of Biomedical Informatics
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
A probabilistic interpretation of precision, recall and F-score, with implication for evaluation

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Guest Editorial: The state of the art in text mining and natural language processing for pharmacogenomics

Journal of Biomedical Informatics
The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A significant amount of information about drug-related safety issues such as adverse effects are published in medical case reports that can only be explored by human readers due to their unstructured nature. The work presented here aims at generating a systematically annotated corpus that can support the development and validation of methods for the automatic extraction of drug-related adverse effects from medical case reports. The documents are systematically double annotated in various rounds to ensure consistent annotations. The annotated documents are finally harmonized to generate representative consensus annotations. In order to demonstrate an example use case scenario, the corpus was employed to train and validate models for the classification of informative against the non-informative sentences. A Maximum Entropy classifier trained with simple features and evaluated by 10-fold cross-validation resulted in the F"1 score of 0.70 indicating a potential useful application of the corpus.