Effect of utilizing terminology on extraction of protein-protein interaction information from biomedical literature

  • Authors:
  • Junko Hosaka;Judice L. Y. Koh;Akihiko Konagaya

  • Affiliations:
  • Genomic Sciences Center, Yokohama, Kanagawa, Japan;Institute for Infocomm Research, Singapore;Genomic Sciences Center, Yokohama, Kanagawa, Japan

  • Venue:
  • EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the amount of on-line scientific literature in the biomedical domain increases, automatic processing has become a promising approach for accelerating research. We are applying syntactic parsing trained on the general domain to identify protein-protein interactions. One of the main difficulties obstructing the use of language processing is the prevalence of specialized terminology. Accordingly, we have created a specialized dictionary by compiling on-line glossaries, and have applied it for information extraction. We conducted preliminary experiments on one hundred sentences, and compared the extraction performance when (a) using only a general dictionary and (b) using this plus our specialized dictionary. Contrary to our expectation, using only the general dictionary resulted in better performance (recall 93.0%, precision 91.0%) than with the terminology-based approach (recall 92.9%, precision 89.6%).