Scaling up biomedical event extraction to the entire PubMed

  • Authors:
  • Jari Björne;Filip Ginter;Sampo Pyysalo;Jun'ichi Tsujii;Tapio Salakoski

  • Affiliations:
  • University of Turku, Turku, Finland and Turku Centre for Computer Science (TUCS), Turku, Finland;University of Turku, Turku, Finland;University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan and University of Manchester, Manchester, UK;University of Turku, Turku, Finland and Turku Centre for Computer Science (TUCS), Turku, Finland

  • Venue:
  • BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the first full-scale event extraction experiment covering the titles and abstracts of all PubMed citations. Extraction is performed using a pipeline composed of state-of-the-art methods: the BANNER named entity recognizer, the McClosky-Charniak domain-adapted parser, and the Turku Event Extraction System. We analyze the statistical properties of the resulting dataset and present evaluations of the core event extraction as well as negation and speculation detection components of the system. Further, we study in detail the set of extracted events relevant to the apoptosis pathway to gain insight into the biological relevance of the result. The dataset, consisting of 19.2 million occurrences of 4.5 million unique events, is freely available for use in research at http://bionlp.utu.fi/.