PubMed-scale event extraction for post-translational modifications, epigenetics and protein structural relations

Authors:
Jari Björne;Sofie Van Landeghem;Sampo Pyysalo;Tomoko Ohta;Filip Ginter;Yves Van de Peer;Sophia Ananiadou;Tapio Salakoski
Affiliations:
Turku Centre for Computer Science (TUCS), Joukahaisenkatu, Turku, Finland and University of Turku, Finland;VIB, Technologiepark, Gent, Belgium and Ghent University, Gent, Belgium;National Centre for Text Mining and University of Manchester, Manchester Interdisciplinary Biocentre, Manchester, UK;National Centre for Text Mining and University of Manchester, Manchester, UK;University of Turku, Finland;VIB, Technologiepark, Gent, Belgium and Ghent University, Gent, Belgium;National Centre for Text Mining and University of Manchester, Manchester, UK;Turku Centre for Computer Science (TUCS), Joukahaisenkatu, Turku, Finland and University of Turku, Finland
Venue:
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Year:
2012

Citing 16
Cited 0

Literature mining and database annotation of protein phosphorylation using a rule-based system

Bioinformatics
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
An online literature mining tool for protein phosphorylation

Bioinformatics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Overview of BioNLP'09 shared task on event extraction

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Extracting complex biological events with rich graph-based feature sets

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Static relations: a piece in the biomedical information extraction puzzle

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Event extraction for post-translational modifications

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Scaling up biomedical event extraction to the entire PubMed

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Integration of static relations to enhance event extraction from text

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Any domain parsing: automatic domain adaptation for natural language parsing

Any domain parsing: automatic domain adaptation for natural language parsing
Overview of BioNLP Shared Task 2011

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Overview of the epigenetics and post-translational modifications (EPI) task of BioNLP Shared Task 2011

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Overview of the entity relations (REL) supporting task of BioNLP Shared Task 2011

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
BioNLP Shared Task 2011: supporting resources

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Generalizing biomedical event extraction

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent efforts in biomolecular event extraction have mainly focused on core event types involving genes and proteins, such as gene expression, protein-protein interactions, and protein catabolism. The BioNLP'11 Shared Task extended the event extraction approach to sub-protein events and relations in the Epigenetics and Post-translational Modifications (EPI) and Protein Relations (REL) tasks. In this study, we apply the Turku Event Extraction System, the best-performing system for these tasks, to all PubMed abstracts and all available PMC full-text articles, extracting 1.4M EPI events and 2.2M REL relations from 21M abstracts and 372K articles. We introduce several entity normalization algorithms for genes, proteins, protein complexes and protein components, aiming to uniquely identify these biological entities. This normalization effort allows direct mapping of the extracted events and relations with post-translational modifications from UniProt, epigenetics from PubMeth, functional domains from InterPro and macromolecular structures from PDB. The extraction of such detailed protein information provides a unique text mining dataset, offering the opportunity to further deepen the information provided by existing PubMed-scale event extraction efforts. The methods and data introduced in this study are freely available from bionlp.utu.fi.