Semi-supervised abstraction-augmented string kernel for multi-level bio-relation extraction

  • Authors:
  • Pavel Kuksa;Yanjun Qi;Bing Bai;Ronan Collobert;Jason Weston;Vladimir Pavlovic;Xia Ning

  • Affiliations:
  • Department of Computer Science, Rutgers University;NEC Labs America, Princeton;NEC Labs America, Princeton;NEC Labs America, Princeton;Google Research, New York City;Department of Computer Science, Rutgers University;Computer Science Department, University of Minnesota

  • Venue:
  • ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bio-relation extraction (bRE), an important goal in bio-text mining, involves subtasks identifying relationships between bio-entities in text at multiple levels, e.g., at the article, sentence or relation level. A key limitation of current bRE systems is that they are restricted by the availability of annotated corpora. In this work we introduce a semisupervised approach that can tackle multi-level bRE via string comparisons with mismatches in the string kernel framework. Our string kernel implements an abstraction step, which groups similar words to generate more abstract entities, which can be learnt with unlabeled data. Specifically, two unsupervised models are proposed to capture contextual (local or global) semantic similarities between words from a large unannotated corpus. This Abstraction-augmented String Kernel (ASK) allows for better generalization of patterns learned from annotated data and provides a unified framework for solving bRE with multiple degrees of detail. ASK shows effective improvements over classic string kernels on four datasets and achieves state-of-the-art bRE performance without the need for complex linguistic features.