The Frame-Based Module of the SUISEKI Information Extraction System

  • Authors:
  • Christian Blaschke;Alfonso Valencia

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Intelligent Systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

SUISEKI is an information extraction system that detects protein interactions in scientific text. It uses morphological, syntactical, and contextual information to detect gene and protein names without using organism-specific dictionaries of names, together with heuristics of how protein interactions tend to be expressed in text. The authors describe the details of the rules (so-called "frames") currently included in the system, including their intrinsic value, coverage, and performance. Although at a detailed level this approach can capture only a fraction of the interactions contained within different sentences, a clear relationship exists between the frequency of detection of interactions and the accuracy of the information obtained, to the extent that frequent interactions can be accurately detected with less than 20 percent error. The authors therefore propose that the use of predefined frames in combination with statistical and linguistical methods is a valid alternative for the analysis of interaction networks described in the molecular biology literature.