Extracting protein-protein interactions in biomedical literature using an existing syntactic parser

  • Authors:
  • Hyunchul Jang;Jaesoo Lim;Joon-Ho Lim;Soo-Jun Park;Seon-Hee Park;Kyu-Chul Lee

  • Affiliations:
  • Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Department of Computer Engineering, Chungnam National University, Daejeon, Republic of Korea

  • Venue:
  • KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We are developing an information extraction system for life science literature. We are currently focusing on PubMed abstracts and trying to extract named entities and their relationships, especially protein names and protein-protein interactions. We are adopting methods including natural language processing, machine learning, and text processing. But we are not developing a new tagging or parsing technique. Developing a new tagger or a new parser specialized in life science literature is a very complex job. And it is not easy to get a good result by tuning an existing parser or by training it without a sufficient corpus. These all are another research topics and we are trying to extract information, not to develop something to help the extracting job or else. In this paper, we introduce our method to use an existing full parser without training or tuning. After tagging sentences and extracting proteins, we make sentences simple by substituting some words like named entities, nouns into one word. Then parsing errors are reduced and parsing precision is increased by this sentence simplification. We parse the simplified sentences syntactically with an existing syntactic parser and extract protein-protein interactions from its results. We show the effects of sentence simplification and syntactic parsing.