Not all links are equal: exploiting dependency types for the extraction of protein-protein interactions from text

  • Authors:
  • Philippe Thomas;Stefan Pietschmann;Illés Solt;Domonkos Tikk;Ulf Leser

  • Affiliations:
  • Humboldt-University of Berlin, Berlin, Germany;Humboldt-University of Berlin, Berlin, Germany;Budapest University of Technology and Economics, Budapest, Hungary;Humboldt-University of Berlin, Berlin, Germany and Budapest University of Technology and Economics, Budapest, Hungary;Humboldt-University of Berlin, Berlin, Germany

  • Venue:
  • BioNLP '11 Proceedings of BioNLP 2011 Workshop
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The extraction of protein-protein interactions (PPIs) reported in scientific publications is one of the most studied topics in Text Mining in the Life Sciences, as such algorithms can substantially decrease the effort for databases curators. The currently best methods for this task are based on analyzing the dependency tree (DT) representation of sentences. Many approaches exploit only topological features and thus do not yet fully exploit the information contained in DTs. We show that incorporating the grammatical information encoded in the types of the dependencies in DTs noticeably improves extraction performance by using a pattern matching approach. We automatically infer a large set of linguistic patterns using only information about interacting proteins. Patterns are then refined based on shallow linguistic features and the semantics of dependency types. Together, these lead to a total improvement of 17.2 percent points in F1, as evaluated on five publicly available PPI corpora. More than half of that improvement is gained by properly handling dependency types. Our method provides a general framework for building task-specific relationship extraction methods that do not require annotated training data. Furthermore, our observations offer methods to improve upon relation extraction approaches.