Sentence identification of biological interactions using PATRICIA tree generated patterns and genetic algorithm optimized parameters

  • Authors:
  • Haibin Liu;Christian Blouin;Vlado Kešelj

  • Affiliations:
  • Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS, Canada B3H 3A1;Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS, Canada B3H 3A1;Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS, Canada B3H 3A1

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

An important task in information retrieval is to identify sentences that contain important relationships between key concepts. In this work, we propose a novel approach to automatically extract sentence patterns that contain interactions involving concepts of molecular biology. A pattern is defined in this work as a sequence of specialized Part-of-Speech (POS) tags that capture the structure of key sentences in the scientific literature. Each candidate sentence for the classification task is encoded as a POS array and then aligned to a collection of pre-extracted patterns. The quality of the alignment is expressed as a pairwise alignment score. The most innovative component of this work is the use of a genetic algorithm (GA) to maximize the classification performance of the alignment scoring scheme. The system achieves an average F-score of 0.796 in identifying sentences which describe interactions between co-occurring biological concepts. This performance is mostly affected by the quality of the preprocessing steps such as term identification and POS tagging.