The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships

  • Authors:
  • Erik M. Van Mulligen;Annie Fourrier-Reglat;David Gurwitz;Mariam Molokhia;Ainhoa Nieto;Gianluca Trifiro;Jan A. Kors;Laura I. Furlong

  • Affiliations:
  • Dept. of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands;Université de Bordeaux, U657, F-33000 Bordeaux, France;Tel-Aviv University, Tel Aviv, Israel;Kings College London, London, United Kingdom;University of Santiago de Compostela, Santiago de Compostela, Spain;Dept. of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands and University of Messina, Messina, Italy;Dept. of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands;Research Programme on Biomedical Informatics (GRIB), IMIM (Hospital del Mar Research Institute), Universitat Pompeu Fabra, Barcelona, Spain

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts.