Not all links are equal: exploiting dependency types for the extraction of protein-protein interactions from text

Authors:
Philippe Thomas;Stefan Pietschmann;Illés Solt;Domonkos Tikk;Ulf Leser
Affiliations:
Humboldt-University of Berlin, Berlin, Germany;Humboldt-University of Berlin, Berlin, Germany;Budapest University of Technology and Economics, Budapest, Hungary;Humboldt-University of Berlin, Berlin, Germany and Budapest University of Technology and Economics, Budapest, Hungary;Humboldt-University of Berlin, Berlin, Germany
Venue:
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Year:
2011

Citing 12
Cited 0

MedPost: a part-of-speech tagger for bioMedical text

Bioinformatics
Discovering patterns to extract protein–protein interactions from the literature: Part II

Bioinformatics
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
RelEx---Relation extraction using dependency parse trees

Bioinformatics
Design of a multi-lingual, parallel-processing statistical parsing engine

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Methodological Review: Extracting interactions between proteins from the literature

Journal of Biomedical Informatics
Inter-species normalization of gene mentions with GNAT

Bioinformatics
Event extraction from trimmed dependency graphs

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
A rich feature vector for protein-protein interaction extraction from multiple corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
OntoGene in BioCreative II.5

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient Extraction of Protein-Protein Interactions from Full-Text Articles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The extraction of protein-protein interactions (PPIs) reported in scientific publications is one of the most studied topics in Text Mining in the Life Sciences, as such algorithms can substantially decrease the effort for databases curators. The currently best methods for this task are based on analyzing the dependency tree (DT) representation of sentences. Many approaches exploit only topological features and thus do not yet fully exploit the information contained in DTs. We show that incorporating the grammatical information encoded in the types of the dependencies in DTs noticeably improves extraction performance by using a pattern matching approach. We automatically infer a large set of linguistic patterns using only information about interacting proteins. Patterns are then refined based on shallow linguistic features and the semantics of dependency types. Together, these lead to a total improvement of 17.2 percent points in F1, as evaluated on five publicly available PPI corpora. More than half of that improvement is gained by properly handling dependency types. Our method provides a general framework for building task-specific relationship extraction methods that do not require annotated training data. Furthermore, our observations offer methods to improve upon relation extraction approaches.