Extracting protein-protein interactions in biomedical literature using an existing syntactic parser

Authors:
Hyunchul Jang;Jaesoo Lim;Joon-Ho Lim;Soo-Jun Park;Seon-Hee Park;Kyu-Chul Lee
Affiliations:
Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Bioinformatics Research Team, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea;Department of Computer Engineering, Chungnam National University, Daejeon, Republic of Korea
Venue:
KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Year:
2006

Citing 8
Cited 1

Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Using Combinatory Categorial Grammar to Extract Biomedical Information

IEEE Intelligent Systems
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Extracting human protein interactions from MEDLINE using a full-sentence parser

Bioinformatics
Discovering patterns to extract protein--protein interactions from full texts

Bioinformatics
Discovering patterns to extract protein–protein interactions from the literature: Part II

Bioinformatics
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine

BioProber2.0: a unified biomedical workbench with mining and probing literatures

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are developing an information extraction system for life science literature. We are currently focusing on PubMed abstracts and trying to extract named entities and their relationships, especially protein names and protein-protein interactions. We are adopting methods including natural language processing, machine learning, and text processing. But we are not developing a new tagging or parsing technique. Developing a new tagger or a new parser specialized in life science literature is a very complex job. And it is not easy to get a good result by tuning an existing parser or by training it without a sufficient corpus. These all are another research topics and we are trying to extract information, not to develop something to help the extracting job or else. In this paper, we introduce our method to use an existing full parser without training or tuning. After tagging sentences and extracting proteins, we make sentences simple by substituting some words like named entities, nouns into one word. Then parsing errors are reduced and parsing precision is increased by this sentence simplification. We parse the simplified sentences syntactically with an existing syntactic parser and extract protein-protein interactions from its results. We show the effects of sentence simplification and syntactic parsing.